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Preface 


This book is an introduction to quantum algorithms unlike any other. It is short, 
yet it is comprehensive and covers the most important and famous quantum 
algorithms; it assumes minimal background, yet is mathematically rigorous; 
it explains quantum algorithms, yet steers clear of the notorious philosophical 
problems and issues of quantum mechanics. 

We assume no background in quantum theory, quantum mechanics, or quan¬ 
tum anything. None. Quantum computation can be described in terms of ele¬ 
mentary linear algebra, so some familiarity with vectors, matrices, and their 
basic properties is required. However, we will review all that we need from 
linear algebra, which is surprisingly little. If you need a refresher, then our 
material should be enough; if you are new to linear algebra, then we suggest 
some places where you can find the required material. It is really not much, so 
do not worry. 

We do assume that you are comfortable with mathematical proofs; that is, 
we assume “mathematical maturity” in a way that is hard to define. Our proofs 
are short and straightforward, except for advanced topics in section 13.5 and 
chapters 15 and 16. This may be another surprise: for all the excitement about 
quantum algorithms, it is interesting that the mathematical tools and meth¬ 
ods used are elementary. The proofs are neat, clever, and interesting, but you 
should have little trouble following the arguments. If you do, it is our fault— 
we hope that our explanations are always clear. Our idea of a standard course 
runs through section 13.4, possibly including chapter 14. 

We strive for mathematical precision. There is always a fine line between 
being complete and clear and being pedantic—hopefully we stay on the right 
side of this. We started with the principle of supplying all the details—all of 
them—on all we present. We have compromised in three places, all having to 
do with approximations. The first is our using the quantum Fourier transform 
“as-is” rather than approximating it, and the others are in chapters 15 and 16. 

For better focus on the algorithms , we chose to de-emphasize quantum cir¬ 
cuits. In fact, we tried to avoid quantum circuits and particularities of quantum 
gates altogether. However, they are excellent to illuminate linear algebra, so 
we have provided a rich set of exercises in chapters 3 through 7, plus two pop¬ 
ular applications in section 8.3. These can in fact be used to support coverage 
of quantum circuits in a wider-scale course. The same goes for complexity 
classes. We prefer to speak operationally in terms of feasible computation, and 
we try to avoid being wedded to the “asymptotically polynomially bounded" 
definition of it. We avoid naming any complexity class until chapter 16. Never¬ 
theless, that chapter has ample complexity content anchored in computational 



Preface 


xii 


problems rather than machine models and is self-contained enough to support 
a course that covers complexity theory. At the same stroke, it gives algebraic 
tools for analyzing quantum circuits. We featured tricks we regard as algorith¬ 
mic in the main text and delegated some tricks of implementation to exercises. 

What makes an algorithm a quantum algorithm? The answer should have 
nothing to do with how the algorithm is implemented in a physical quantum 
system. We regard this as really a question about how programming notation— 
mathematical notation—represents the feasibility of calculations in nature. 
Quantum algorithms use algebraic units called qubits that are richer than bits, 
by which they are allowed to count as feasible some operations that when writ¬ 
ten out in simple linear algebra use exponentially long notation. The rules for 
these allowed operations are specified in standard models of quantum compu¬ 
tation, which are all equivalent to the one presented in this book. It might seem 
ludicrous to believe that nature in any sense uses exponentially long notation, 
but some facet of this appears at hand because quantum algorithms can quickly 
solve problems that many researchers believe require exponential work by any 
“classical” algorithm. In this book, classical means an algorithm written in the 
notation for feasible operations used by every computer today. 

This leads to a word about our presentation. Almost all summaries, notes, 
and books on quantum algorithms use a special notation for vectors and matri¬ 
ces. This is the famous Dirac notation that was invented by Paul Dirac—who 
else. It has many advantages and is the de-facto standard in the study of quan¬ 
tum algorithms. It is a great notation for experts and instrumental to becom¬ 
ing an expert, but we suspect it is a barrier for those starting out who are not 
experts. Thus, we avoid using it, except for a few places toward the end to give 
a second view of some complicated states. Our thesis is that we can explain 
quantum algorithms without a single use of this notation. Essentially this book 
is a testament to that belief: if you find this book more accessible than oth¬ 
ers, then we believe it owes to this decision. Our notation follows certain ISO 
recommendations, including boldface italics for vectors and heavy slant for 
matrices and operators. 

We hope you will enjoy this little book. It can be used to gain an understand¬ 
ing of quantum algorithms by self-study, as a course or seminar text, or even 
as additional material in a general course on algorithms. 


Georgia Institute of Technology, 
University at Buffalo (SUNY), 


Richard J. Lipton 
Kenneth W. Regan 
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Introduction 


One of the great scientific and engineering questions of our time is: 


Are quantum computers possible? 


We can build computers out of mechanical gears and levers, out of electric 
relays, out of vacuum tubes, out of discrete transistors, and finally today out 
of integrated circuits that contain thousands of millions of individual transis¬ 
tors. In the future, it may be possible to build computers out of other types of 
devices—who knows. 

All of these computers, from mechanical to integrated-circuit-based ones, 
are called classical. They are all classical in that they implement the same type 
of computer, albeit as the technology gets more sophisticated the computers 
become faster, smaller, and more reliable. But they all behave in the same way, 
and they all operate in a non-quantum regime. 

What distinguishes these devices is that information is manipulated as bits , 
which already have determinate values of 0 or 1. Ironically, the key compo¬ 
nents of today’s computers are quantum devices. Both the transistor and its 
potential replacement, the Josephson junction, won a Nobel Prize for the quan¬ 
tum theory of their operation. So why is their regime non-quantum? The reason 
is that the regime reckons information as bits. 

By contrast, quantum computation operates on qubits , which are based on 
complex-number values, not just 0 and 1. They can be read only by measuring, 
and the readout is in classical bits. To skirt the commonly bandied notion of 
observers interfering with quantum systems and postpone the discussion of 
measurement as an operation, we offer the metaphor that a bit is what you get 
by “cooking” a qubit. From this standpoint, doing a classical computation on 
bits is like cooking the ingredients of a pie individually before baking them 
together in the pie. The quantum argument is that it’s more expedient to let 
the filling bubble in its natural state while cooking everything at once. The 
engineering problem is whether the filling can stay coherent long enough for 
this to work. 

The central question is whether it is possible to build computers that are 
inherently quantum. Such computers would exploit the power and wonder of 
nature to create systems that can effectively be in multiple states at once. They 
open a world with apparent actions at a distance that the great Albert Ein¬ 
stein never believed but that actually happen—a world with other strange and 
counter-intuitive effects. To be sure, this is the world we live in, so the question 
becomes how much of this world our computers can enact. 
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This question is yet to be resolved. Many believe that such machines will be 
built one day. Some others have fundamental doubts and believe there are phys¬ 
ical limits that make quantum computers impossible. It is currently unclear 
who is right, but whatever happens will be interesting: a world with quantum 
computers would allow us to solve hard problems, while a barrier to them 
might shed light on deep questions of physics and information. 

Happily this question does not pose a barrier to us. We plan to study quan¬ 
tum algorithms, which are interesting whether quantum computers are built 
soon, in the next ten years, in the next fifty years, or never. The area of quantum 
algorithms contains some beautiful ideas that everyone interested in computa¬ 
tion should know. 

The rationale for this book is to supply a gentle introduction to quan¬ 
tum algorithms. We will say nothing more about quantum computers—about 
whether they will be built or how they may work—until the end. We will only 
discuss algorithms. 

Our goal is to explain quantum algorithms in a way that is accessible to 
almost anyone. Curiously, while quantum algorithms are quite different from 
classical ones, the mathematical tools needed to understand them are quite 
modest. The mathematics that is required to understand them is linear algebra: 
vectors, matrices, and their basic properties. That is all. So these are really 
linear-algebraic algorithms. 


1.1 The Model 

The universe is complex and filled with strange and wonderful things. From 
lifeforms like viruses, bacteria, and people; to inanimate objects like comput¬ 
ers, airplanes, and bridges that span huge distances; from planets, to whole 
galaxies. There is mystery and wonder in them all. 

The goal of science in general, and physics specifically, is to explore and 
understand the universe by discovering the simplest laws possible that explain 
the multitude of phenomena. The method used by physics is the discovery of 
models that predict the behavior of all from the smallest to the largest objects. 
In ancient times, the models were crude: the earliest models “explained” all 
by reducing everything to earth, water, wind, and fire. Today, the models are 
much more refined—they replace earth and water by hundreds of particles and 
wind and fire by the four fundamental forces. Mainly, the models are better at 
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predicting, that is, in channeling reproducible knowledge. Yet the full theory, 
the long-desired theory of everything, still eludes us. 

Happily, in this introduction to quantum algorithms, we need only a sim¬ 
ple model of how part of the universe works. We can avoid relativity, special 
and general; we can avoid the complexities of the Standard Model of particle 
physics, with its hundreds of particles; we can even avoid gravity and electro¬ 
magnetism. We cannot quite go back to earth, water, wind, and fire, but we can 
avoid having to know and understand much of modern physics. This avowal 
of independence from physical qualities does not prevent us from imagining 
nature’s workings. Instead, it speaks to our belief that algorithmic considera¬ 
tions in information processing run deeper. 


1.2 The Space and the States 


So what do we need to know? We need to understand that the state of our 
quantum systems will always be described by a single unit vector a that lies in 
some fixed vector space of dimension N — 2", for some n. That is, the state is 
always a vector 

r «o i 


a = 


ClN -1 


where each entry is a real or complex number depending on whether the 
space is real or complex. Each entry is called an amplitude. We will not need 
to involve the full quantum theory of mixed states , which are formally the same 
as classical probability distributions over states like a , which are called pure 
states. That is, we consider only pure states in this text. 

We must distinguish between general states and basis states, which form 
a linear-algebra basis composed of configurations that we may observe. In the 
standard basis, the basis states are denoted by the vectors ek whose entries are 
0 except for a 1 in place k. We identify with the index k itself in [0 ,N — 1] 
and then further identify k with the A-th string x in a fixed total ordering of 
{0,1}". That the basis states correspond to all the length-/; binary strings is 
why we have N — 2 n . The interplays among basis vectors, numerical indices, 
and binary strings encoding objects are drawn more formally in chapter 2. 

Any vector that is not a basis state is a superposition. Two or more basis 
states have nonzero amplitude in any superposition, and only one of them can 
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be obserx’ed individually in any measurement. The amplitude a/,- is not directly 
meaningful for what we may expect to observe but rather its squared absolute 
value, \ak\ 2 . This gives the probability of measuring the system to be in the 
basis state ek- The import of a being a unit vector is that these probabilities 
sum to 1, namely, \ a k\ 2 — 1. 

For N — 2, the idea that the length of a diagonal line defined by the origin 
and a point in the plane involves the sum of two squares is older than Pythago¬ 
ras. The length is 1 precisely when the point lies on the unit circle. We may 
regard the basis state eo as lying on the x-axis, while e \ lies on the y-axis. Then 
measurement projects the state a either onto the “x leg” of the triangle it makes 
with the x-axis or the “y leg” of the triangle along the y-axis. 

It may still seem odd that the probabilities are proportional not to the lengths 
of the legs but to their squares. But we know that if the angle is 6 from the near 
part of the x-axis (so 0 < 8 < tt/2), then the lengths are cos(6*) and sin($), 
respectively, and it is cos 2 (&) + sin 2 (0). not cos($) + sin(0), that sums to 1. 
If we wanted to use points whose legs sum directly to 1, we’d have to use the 
diamond that is inscribed inside the circle. In A'-dimensional space, we’d have 
to use the /V-dimensional simplex rather than the sphere. Well the simplex is 
spiky and was not really studied until the 20th century, whereas the sphere is 
smooth and nice and was appreciated by the ancients. Evidently nature agrees 
with ancient aesthetics. We may not know why the world works this way, but 
we can certainly say, why not? 

Once we agree, all we really need to know about the space is that it sup¬ 
ports the picture of Pythagoras, that is, ordinary Euclidean space. Both the real 
vector spaces M. N and the complex ones <C N do so. That is, they agree on how 
many components their vectors have and how distances are measured by taking 
squares of values from the vector components. They differ only on what kind of 
numbers these component values v can be, but the idea of the norm or absolute 
value |v| quickly reconciles this difference. This aspect was first formalized by 
Euclid’s great rigorizer of the late 19th and early 20th centuries, David Hilbert; 
in his honor, the common concept is called a Hilbert Space. Hilbert’s concept 
retains its vigor even if “N” can be infinite, or if the “vectors” and “compo¬ 
nents” are strange objects, but in this book, we need not worry: N will always 
be finite, and the space H ,v will be or C N . Allowing the latter is the reason 
that we say “Hilbert space” not “ordinary Euclidean space.” 
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1.3 The Operations 

In fact, our model embraces the circle and the sphere even more tightly. It 
throws out—it disallows—all other points. Every state a must be a point on 
the unit sphere. If you have heard the term “projective space,” then that is what 
we are restricting to. But happily we need not worry about restricting the points 
so long as we restrict the operations in our model. The operations must map 
any point on the unit sphere to some (other) point on the unit sphere. 

We will also restrict the operations to be linear. Apart from measurements, 
they must be onto the whole sphere—which makes them map all of onto 
all of IH[ :V . By the theory of vector subspaces, this means the operations must 
also be invertible. Operations with all these properties are called unitary. 

We will represent the operations by matrices, and we give several equivalent 
stipulations for unitary matrices in chapter 3, followed by examples in chap¬ 
ter 5 and tricks for working with them in chapter 6. But we can already under¬ 
stand that compositions of unitary operations are unitary, and their representa¬ 
tions and actions can be figured by the familiar idea of multiplying matrices. 
Thus, our model’s programs will simply be compositions of unitary matrices. 
The one catch is that the matrices themselves will be huge, out of proportion to 
the actual simplicity of the operation as we believe nature meters it. Hence, we 
will devote time in chapters 4 and 5 to ensuring these operations are feasible 
according to standards already well accepted in classical computation. 

Thus, we can comply with the requirements for any computational model 
that we must understand what state the computation starts in, how it moves 
from one state to another, and how we get information out of the computation. 

Start: We will ultimately be able to assume that the start state is always the 
elementary vector 

0 

e 0 = . 

_0 

of length N. Because the first binary string in our ordering of {0,1}" will be 0", 
our usual start state will denote the binary string of n zeros. 

Move: If the system is in some state a, then we can move it by applying 
a unitary transformation U. Thus, a will move to b where b — Ua. Not all 
unitary transformations are allowed, but we will get to that later. Note that if a 
is a unit vector, then so is b. 
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End: We get information out of the quantum computation by making a mea¬ 
surement. If the final state is c, then k is seen with probability |q| 2 . Note that 
the output is just the index k, not the probability of the index. Often we will 
have a distinguished set S of indices that stand for accept, with outcomes in 
[0, N — 1] \ S standing for reject. 

That is it. 


1.4 Where Is the Input? 

In any model of computation, we expect that there is some way to input infor¬ 
mation into the model’s devices. That seems, at first glance, to be missing from 
our model. The start state is fixed, and the output method is fixed, so where do 
we put the input? The answer is that the input can be encoded by the choice 
of the unitary transformation U, in particular by the first several unitary opera¬ 
tions in U, so that for different inputs we will apply different transformations. 

This can easily be done in the case of classical computations. We can dis¬ 
pense with explicit input provided we are allowed to change the program each 
time we want to solve a different problem. Consider a program of this form: 

M = 21; 

x = Factor(M); 

procedure Factor(z) { ... } 

Clearly, if we can access the value of the variable x, then we can determine 
the factors of 21. If we wanted to factor a more interesting number such as 35 
or 1,001 or 11,234,143, then we can simply execute the same program with M 
set to that number. 

This integration of inputs with programs is more characteristic of quantum 
than classical computation. Think of the transformation U as the program, so 
varying U is exactly the same as varying the above program in the classical 
case. We will show general ways of handling inputs in chapter 6, while for 
several famous algorithms, in chapters 8-10, the input is expressly given as a 
transformation that is dropped into a larger program. Not to worry—our chap¬ 
ter 7 in the middle also provides support for the classical notion of feeding a 
binary string x directly as the input, while also describing the ingredients of 
the “quantum power’’ that distinguish these programs from classical ones. 
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1.5 What Exactly Is the Output? 

The problematic words in this section title are the two short ones. With apolo¬ 
gies to President Bill Clinton, the question is not so much what the meaning 
of “is” is, as what the meaning of “the” is. In a classical deterministic model 
of computation, every computation has one, definite, output. Even if the model 
is randomized, every random input still determines a single computation path, 
whose output is definite before its last step is executed. 

A quantum computation, however, presents the user at the end with a slot 
machine. Measuring is pulling the lever to see what output the wheels give 
you. Unlike in some old casinos where slot machines were rigged, you control 
how the machine is built, and hopefully you’ve designed it to make the prob¬ 
abilities work in your favor. As with the input, however, the focus shifts from 
a given binary string to the machinery itself, and to the action of sampling a 
distribution that pulling the measurement lever gives you. 

In chapter 6, we will also finesse the issue of having measurements in the 
middle of computations that continue from states projected down to a lower¬ 
dimensional space. This could be analogized to having a slot machine still spin¬ 
ning after one wheel has fixed its value. We show why measurements may gen¬ 
erally be postponed until the end, but the algorithms in this textbook already 
behave that way. This helps focus the idea that the ultimate goal of quantum 
computation is not a single output but rather a sampling device. Chapter 6 also 
lays groundwork for how those devices can be re-used to improve one’s chance 
of success. Chapter 7 lays out the format for how we present and analyze quan¬ 
tum algorithms and gives further attention to entanglement, interference, and 
measurement. 

All of this remains mere philosophy, however, unless we can show how 
the results of the sampling help solve concrete problems efficiently in distin¬ 
guished ways. These solutions are the ultimate outputs, as exemplified in chap¬ 
ters 8-10. In chapters 11 and 12, the outputs are factorizations of numbers, via 
the algorithm famously discovered by Peter Shor. In chapters 13-15, they are 
objects that we need to search for in a big search space. Chapter 16 branches 
out to topics in quantum complexity theory, defining the class BQP formally 
and proving upper and lower bounds on it in terms of classical complexity. 
Chapter 17 summarizes the algorithms and discusses some further topics and 
readings. Saying this completes the application of the model and the overview 
of this book. 
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1.6 Summary and Notes 

In old Platonic fashion, we have tried to idealize quantum computation by pure 
thought apart from physical properties of the world around us. Of course one 
can make errors that way, as Aristotle showed by failing to climb a tower and 
drop equal-sized balls of unequal weights. It is vain to think that Pythagoras or 
Euclid or Archimedes could have come up with such a computational model 
but maybe not so vain to estimate it of Hilbert. Linear algebra and geometry 
were both deepened by Hilbert. Meanwhile, the quantum theory emerged and 
ideas of computation were extensively discussed long before Alan Turing gave 
his definitive classical answer (Turing, 1936). 

The main stumbling block may have been probability. Quantum physicists 
were forced to embrace probability from the get-go, in a time when Newtonian 
determinism was dominant and Einstein said “God”—that is, nature—“does 
not play dice.” Some of our algorithms will be deterministic—that is, we will 
encounter cases where the final points on the unit sphere coincide with standard 
basis vectors, whereupon all the probabilities are 0 or 1. However, coping with 
probabilistic output appears necessary to realize the full power of quantum 
computation. Another block, even for the physicists happy with dice, may have 
been the long time it took to realize the essence of computation. Turing’s paper 
reached full flower only with the emergence of computing machines during and 
after World War II. 

Even then, it took Richard Feynman, arguably the main visionary in the area 
after the passing of lohn von Neumann in 1956, until his last decade in the 
1980s to set his vision down (Feynman, 1982, 1985). That is when it came to 
the attention of David Deutsch (1985) and some others. The theory still had 
several false starts—for instance, the second of us overlapped with Deutsch in 
1984-1986 at Oxford’s Mathematical Institute and saw its fellows turn aside 
Deutsch’s initial claims to be able to compute classically uncomputable func¬ 
tions. 

It can take a long time for a great theory to mature, but a great theorem 
such as Peter Shor’s on quantum factoring can accelerate it a lot. We hope 
this book helps make the ascent to understanding it easier. We chose M — 21 
in section 1.4 because it is currently the highest integer on which practical 
runs of Shor’s algorithm have been claimed, but even these are not definitively 
established (Smolin et al., 2013). 
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Numbers and Strings 


Before we can start to present quantum algorithms, we need to discuss the 
interchangeable roles of natural numbers and Boolean strings. The set N of 
natural numbers consists of 

0,1,2,3,... 

as usual. 

A Boolean string is a string that consists solely of bits: a bit is either 0 or 1. 
In computer science, such strings play a critical role, as you probably already 
know, because our computational devices are all based on being either “on" or 
“off," “charged" or “uncharged,” “magnetized" or “unmagnetized," and so on. 

The operations we use on natural numbers are the usual ones. For example, 
x+y is the sum of x and y, and x • y is their product. There is nothing new 
or surprising here. The operations we use on Boolean strings are also quite 
simple: The length of a string is the number of bits in the string, and if x and 
y are Boolean strings, then xy is their concatenation. Thus, if x = 0101 and 
y = 111, then we have xy = 0101111. This is a kind of “product” operation on 
strings, but we find it convenient not to use an explicit operator symbol. If you 
see xy and both x and v are strings, then xy is the result of concatenating them 
together. 

What we need to do is switch from numbers to Boolean strings and back. 
Sometimes it is best to use the number representation and other times the string 
representation. This kind of dual nature is basic in computer science and will 
be used often in describing the quantum algorithms. There are, however, some 
hitches that must be regarded to make it work properly. Let’s look and see why. 

If m is a natural number, then it can uniquely be written as a binary number: 
Let 

m — 2 n ~ l x n -i + • • • + 2xi + x'o, 

where each xy is a bit, and we insist that x„_ i is nonzero. Then we can use m to 
denote the Boolean string x n -\ .. .x'ix'o- For instance, 7 maps to the string 111. 
In the reverse direction, we can use the string 

Xn— 1? • • • A0 


to denote the natural number 

2" ^x n ~i + • • • + 2x'i + xo- 

For example, the string 10010 is the number 16 + 2 = 18. 

Often we will be concerned with numbers in a fixed range 0... N — 1 where 
N — 2". This range is denoted by [A]. Then it will be convenient to omit the 
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leading 1 and use n bits for each number, so that zero is u-many Os, one is 
O'' -1 1, and so on up to N — 1 = 1", where 1" means n-many l’s. We call this 
the canonical numbering of {0,1}". For example, with n = 3: 


000 

= 0 

100 = 

4 

001 

= 1 

101 = 

5 

010 

= 2 

110 = 

6 

Oil 

= 3 

111 = 

7. 


The small but important issue is that, for the representation from numbers to 
strings to be unambiguous, we must know how long the strings are. Otherwise, 
what does the number 0 represent? Does it represent 0 or 00 or 000 and so 
on? This is why we said earlier that the mapping between numbers and strings 
is not exact. To make it precise, we need to know the length of the strings. A 
more technical way of saying this is that once we specify the mapping as being 
between the natural numbers 0,1,... ,2" — 1 and the strings of length n (that 
is, {0,1}"), it is one-to-one. Note that 0 as a number now corresponds to the 
unique string 

0 _^ 0 . 

total of n zeros 

There is one more operation that we use on Boolean strings. If x and y are 
Boolean strings of length in, then x • y is their Boolean inner product, which 
is defined to be 

xiyi © • • • © x m y m . 

Here © means exclusive-or, which is the same as addition modulo 2. Hence, 
sometimes we may talk about Boolean strings as being members of an m- 
dimensional space with addition modulo 2. We must warn that the name inner 
product is also used when we talk about Hilbert spaces in chapter 3. Many 
sources use x ■ y to mean concatenation of strings, but we reserve the lighter 
dot for numerical multiplication. When x and y are single bits, x ■ y is the same 
as x • v, but using the lighter dot still helps remind us that they are single bits. 
Sometimes this type of overloading occurs in mathematics—we try to make 
clear which is used when. 

A further neat property of Boolean strings is that they can represent subsets 
of a set. If the set is {1,2,3}, in that order, then 000 corresponds to the empty 
set, 011 to {2,3}, 100 to {1}, 111 to the whole set {1,2,3}, and so on. 
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2.1 Asymptotic Notation 


Suppose we run an algorithm that on problems of size n works in n “passes,” 
where the i-th pass takes i “steps.” How long does the whole algorithm take? 
If we want to be exact about the number sin) of steps, then we can calculate: 


?00 = X! ' = 


n in - 


i) i,i 
= 2» + 5 ». 


1=1 

If we view the process pictorially, then we see the passes are tracing out a trian¬ 
gular half of an n x n square along the main diagonal. This intuition says “\n 2 ” 
without worrying about whether the main diagonal is included or excluded or 
“halved." The difference is a term \n whose added size is relatively tiny as n 
becomes moderately large, so we may ignore it. Formally, we define: 


Definition 2.1 Two functions s(n) and tin) on N are asymptotically equiv¬ 
alent, written sin) ~ tin), if lim„^oo jjjj exists and equals 1. 

So s(n) ~ ^n 2 , which we can also encapsulate by saying s(n) is quadratic 
with “principal constant” j. But suppose now we don’t know or care about 
the actual time units for a “step,” only that the algorithm’s cost scales as n 2 . 
Another way of saying this is that as the data size n doubles, the time for 
the algorithm goes up by about a factor of 4. This idea doesn’t care what the 
constant multiplying n 2 is, only that it is some constant. Hence, we define: 


Definition 2.2 Given two functions s(«), tin) on N, write: 


• sin) = Oitin)) if there are constants c, d such that for all n, 

s(n) < c ■ tin) + d. 

• sin) — Q(t(n)) if tin) = Oisin)), and 

• sin) = 0(f(u)) if sin) — Oitin)) and sin) — Clitin)). 


In the first case, we say sin) is “order-of” tin) or “Big-Oh-of” tin), whereas in 
the second, we might say tin) is “asymptotically bounded below by” sin), and 
in the third, we say sin) and tin) have the same “asymptotic order.” 

A sufficient condition for sin) — 0(f(u)) is that the limit lim,,-^ exists 
and is some positive number. If the limit is zero, then we write sin) — oitin)) 
instead—this “little-oh" notation is stronger than writing sin) — Oitin)) here. 
Thus, we can say about our sin) example above: 




12 


Chapter 2 Numbers and Strings 


• s(n) — 0(h 2 ); 

• s(n) = o(h 3 ); 

• log(s(n)) = Q(logn). 

Indeed, the last gives log(s(n)) = © (log /; ), but it does not give log(s(n)) ~ 
log(n) because the exponent 2 in s(n) becomes a multiplier of 2 on the loga¬ 
rithm. The choice of base for the logarithm also affects the constant multiplier, 
but not the © relation. The latter enables us not to care about what the base is 
or even whether two logarithms have the same base. This kind of freedom is 
important when analyzing the costs of algorithms and even in thinking what 
the goals are of designing them. 

One further important idea, which we will begin employing in chapter 13, 
uses logarithms in a different way. Write/(n) = 0(g(n)) if there is some finite 
k such that//;) = 0(g(n) (log g(n)) k ). This is pronounced “/ is Oh-tilde of g,” 
and carries the idea that sometimes logarithmic as well as constant factors can 
be effectively ignored. 


2.2 Problems 

2.1. Let x be a Boolean string. What type of number does the Boolean string 
xO represent? 

2.2. Let x be a Boolean string with exactly one bit a 1. What can you say 
about the number it represents? Does this identification depend on using the 
canonical numbering of {0,1}", where n is the length of x? 

2.3. Compute the 4x4 “times-table” of x • y for x,y e {00,01,10,11}. Then 
write the entries in the form (“l/'L 

2.4. Let x be a Boolean string of even length. Can the Boolean string xxx ever 
represent a prime number in binary notation? 

2.5. Show that a function/: N —» N is bounded by a constant if and only if 
f(n) — 0(1), and is linear if and only if/(n) = 0 (h). 

2.6. Show that a function/: N —» N is bounded by a polynomial in n, written 
f(ri) — h° 1*1, if and only if there is a constant C such that for all sufficiently 
large n,f(2n) < Cf(n). How does C relate to the exponent k of the polynomial? 
Thus, we can characterize algorithms that run in polynomial time as those 
for which the amount of work scales up only linearly as the size of the data 
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grows linearly. Later we will use this criterion as a benchmark for feasible 
computation. 

2.7. Let M be an n-bit integer, and let a < M. Give a bound of the form 0(s(nj) 
for the time needed to find the remainder when M is divided into a 2 . 

2.8. Now suppose we want to compute a 75 modulo M. Give a concrete bound 
on the number of squarings and divisions by M one needs to do, never allowing 
any number to become bigger than M 2 . 

2.9. Use the ideas of problems 2.7 and 2.8 to show that given any a < M, the 
function f a defined for all integers x < M by 

fa(x) — a x mod M 

can be computed in n° ^ time. 


2.3 Summary and Notes 

Numbers and strings are made out of the same “stuff,” which are characters 
over a finite alphabet. With numbers we call them “digits,” whereas with binary 
strings we call them “bits,” but to a computer they are really the same. Switch¬ 
ing mentally from one to the other is often a powerful way to promote one’s 
understanding of theoretical concepts. This is especially true in quantum com¬ 
putations, where the bits of a Boolean string—or rather their indexed loca¬ 
tions in the string—will be treated as quantum coordinates or qubits. The next 
chapter shows how we use both numbers and strings as indices to vectors and 
matrices. 

It is interesting that while the notion of natural numbers is ancient, the notion 
of Boolean strings is much more recent. Even more interesting is that it is only 
with the rise of computing that the importance of using just the two Boolean 
values 0,1 has become so clear. Asymptotic notation helped convince us that 
whether we operate in base 10 or 2 or 16 or 64, the difference is secondary 
compared to the top-level structure of the algorithm, which usually determines 
the asymptotic order of the running time. 
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A vector a of dimension N is an ordered list of values, just as usual. Thus, a of 
dimension N stands for: 

ao 
a i 

Instead of the standard subscript notation to denote the A-th element, we will 
use a (A) to denote it. This has the advantage of making some of our equations 
a bit more readable, but just remember that 

a (A) is the same as cik- 

One rationale for this notation is that a vector can often be best viewed as being 
indexed by other sets rather than just the usual 0,... ,1V — 1. The functional 
notation a(k) seems to be more flexible in this regard. 

A fancier justification is that sometimes vector spaces are defined as func¬ 
tions over sets, so this notation is consistent with that. In any event, a (A) is 
just the element of the vector that is indexed by A. A concrete example is to 
consider a vector a of dimension 4. In this case, we may use the notation 

a(0),a(l),a(2),a(3), 

for its elements, or we may employ the notation 

a(00),fl(01),a(10),«(ll), 

using Boolean strings to index the elements. 

A philosophical justification for our functional notation is that vectors are 
pieces of code. We believe that nature computes with code—not with the graph 
of the code. For instance, each elementary standard basis vector e^ is 0 except 
for the k -th coordinate, which is 1, and we use the subscript when thinking 
of it as an object. When N = 2", we index the complex coordinates from 0 
as 0,..., N — 1 and enumerate {0, 1}" as xq, ... ,xjv-i, but we index the binary 
string places from 1 as 1Doing so helps tell them apart. When thinking 
of ct as a piece of code, we get the function e^(x) = 1 if x — Xk and e^(x) — 0 
otherwise. We can also replace A by a binary string as a subscript, for instance, 
writing the four standard basis vectors when n — 2 as coo-^oi.cio.cn. 



16 


Chapter 3 Basic Linear Algebra 


3.1 Hilbert Spaces 


A real Hilbert space is nothing more than a fancy name for the usual Euclidean 
space. We will use Ely to denote this space of dimension N. The elements are 
real vectors a of dimension N. They are added and multiplied by scalars in the 
usual manner: 


• If a, b are vectors in this space, then so is a + b, which is defined by 


0(0) 


b( 0) 


u(0) + b(0) 


+ 


= 


a(N — 1) 


b(N- 1) 


a(N-l) + b(N- 1) 




If a is a vector again in this space and c is a real number, then b — ca is 
defined by 


0(0) 


cu(0) 

a(N — 1) 


ca{N — 1) 


The abstract essence of a Hilbert space is that each vector has a norm: The 
norm of a vector a, really just its length, is defined to be 


Mali = 


^a(k ) 2 


1/2 


Note that in the case of two dimensions, the norm of the vector 


a = 



is its usual length in the plane, Vr 2 + s 2 . A Hilbert space simply generalizes 
this to many dimensions. A unit vector is just a vector of norm 1. Unit vectors 
together comprise the unit sphere in any Hilbert space. 


3.2 Products and Tensor Products 

The ordinary Cartesian product of an ///-dimensional Hilbert space Eli and 
an //-dimensional Hilbert space EI 2 is the (m + z/)-dimensional Hilbert space 
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obtained by concatenating vectors from the former with vectors from the latter. 
Its vectors have the form a(i) with i e [m + n]. 

Their tensor product Hi ® H 2 , however, has vectors of the form a(k), where 
1 < k < mn. Indeed, k is in 1-1 correspondence with pairs (ij) of indices 
where i e [m\ and j e [n]. Because we regard indices as strings, we can write 
them juxtaposed as a(ij). 

The tensor product of two vectors a and b is the vector c — a defined by 

c(ij)=a(i)b(j). 

A vector denoting a pure quantum state is separable if it is the tensor product 
of two other vectors; otherwise it is entangled. The vectors coo and e n are 
separable, but their unit-scaled sum -Weoo + <? 11 ) is entangled. The standard 
basis vectors of Hi ® H 2 are separable, but this is not true of many of their 
linear combinations, all of which still belong to Hi ® H 2 because it is a Hilbert 
space. 

Often our quantum algorithms will operate on the product of two Hilbert 
spaces, each using binary strings as indices, which gives us the space of vectors 
of the form a{xy). Here x ranges over the indices of the first space and y over 
those of the second space. Writing a(xy) does not entail that a is separable. 


3.3 Matrices 

Matrices represent linear operators on Hilbert spaces. We can add them 
together, we can multiply them, and of course we can use them to operate on 
vectors. We assume these notions are familiar to you; if not, please see sources 
in this chapter’s end notes. A typical example is 

UVa =b. 

This means: apply the V transform to the vector «, then apply the U transform 
to the resulting vector, and the answer is the vector b. The matrix In denotes 
the N x N identity matrix. We use square brackets for matrix entries to dis¬ 
tinguish them further from vector amplitudes, so that the identity matrix has 
entries /iv[r, c] = 1 if r = c, /,v[r, c] = 0 otherwise. One of the key properties 
of matrices is that they define linear operations, namely: 


U(a+b)= Ua+Ub. 
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The fact that all our transformations are linear is what makes quantum algo¬ 
rithms so different from classical ones. This will be clearer as we give exam¬ 
ples, but the linearity restriction both gives quantum algorithms great power 
and curiously makes them so different from classical ones. 

If U is a matrix, then we use U 1 to denote the A-th power of the matrix. 


u k = uu u. 


k copies 


Definition 3.1 The transpose of a matrix U is the matrix V such that 

V[r,c ] = U[c,r], 

We use U T as usual to denote the transpose of a matrix. We also use trans¬ 
pose for vectors but only when writing them after a matrix in lines of text. Gen¬ 
erally, we minimize the distinction between row and column vectors, using the 
latter as standard. The inner product of two real vectors a and b is given by 

m 

(a,b) — 'y^a(k)b(k). 

k= 0 

Definition 3.2 A real matrix U is unitary provided U T U = I. 

Here are three unitary 2x2 real matrices. Note that the last, called the 
Hadamard matrix, requires a constant multiplier to divide out a factor of 
2 that comes from squaring it. 


1 0 

, x = 

0 1 

, H= 4= 

1 1 

0 1 


1 0 

V2 

1 -1 


The first two are also permutation matrices, meaning square matrices each 
of whose rows and columns has all zeros except for a single 1. All permutation 
matrices are unitary. 

Another definition of a unitary matrix is based on the notion of orthogonal¬ 
ity. Call two vectors a and b orthogonal if their inner product is 0. A matrix 
U is unitary proved each row is a unit vector and any two distinct rows are 
orthogonal. The reason these matrices are so important is that they preserve 
the Euclidean length. 

Lemma 3.3 If U is a unitary matrix and a is a vector, then 11 Ua\\ = | |a| |. 
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Proof. Direct calculation gives: 

\\Ua\\ 2 = ^\Ua(x)\ 2 

X 

= ^ Uaix) ■ Ua(x) 

X 

= ^(^U[x,y]a(y))(^U[x,z]a(z)) 

x y z 

= Z(ZZ^ ,y]U[x,z]a(y)a(z)) 

x y Z 

= XZZ1 (U[x,y]U[x,z]))a(y)a(z)) 

y z x 

= z«o)«(y) = ii«n 2 

y 

because the inner product of U\ — ,y] and U | —, z] is 1 or 0 according as y = z. 

□ 

As far as we can, we try not to care about whether a Hilbert space is real or 
complex. However, we do need notation for complex spaces. 


3.4 Complex Spaces and Inner Products 

To describe some algorithms, mainly Shor’s, we need to use complex vectors 
and matrices. In this case, all definitions are the same except for the notion of 
transpose and inner product. Both now need to use the conjugation operation: 
if z — x+ iy where x,y are real numbers and i — as usual, then the conju¬ 
gate of z is x' — iy and is denoted by z. We now define the adjoint of a matrix 
U to be the matrix V — U* such that 

V[r,c ] = U[c, r], 

A complex matrix U is unitary provided U*U — I. Furthermore, the inner 
product of two complex vectors a and b is defined to be 

m 

(a,b) = ^ a(k)b(k ). 
k= o 

Note that because F is the same as r for a real number r. these concepts 
are the same as what we defined before when the entries of the vectors and 
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matrices are all real numbers. Rather then use special notation for the complex 
case, we will use the same as in the real case—the context should make it clear 
which is being used. The one caveat is that in the few cases where the entries 
of a or U are complex, one needs to conjugate them. Upon observing this, the 
proof of the following is much the same as for Lemma 3.3 above: 

Lemma 3.4 If U is a unitary matrix and a is a vector, then the length of Ua 
is the same as the length of a. 


We can also form tensor products of matrices having any dimensions. If U 
is m x n and V is r x s. then W — U ® V is the mr x ns matrix whose action 
on product vectors c(ij) — a(i)b(j) is as follows: 

( Wc)(ij ) = (Ua)(i)(Vb)(j). 


Because every vector d of dimension rs (whether entangled or not) can be 
written as a linear combination of basis vectors, each of which is a product 
vector, the action Wd is well defined via the same linear combination of the 
outputs on the basis vectors. That is, if 

r s 

1=1 7=1 


then 

r s 

Wd = Y J Y, d Q ( ' Ue ‘)®( Ve i)- 

1=1 7=1 

Note that it does not matter whether the scalars djj are regarded as multiplying 
<?,, ej, or the whole thing. This fact matters later in section 6.5. We mainly 
use tensor products to combine operations that work on separate halves of an 
overall index ij. 


3.5 Matrices, Graphs, and Sums Over Paths 

One rich source of real-valued matrices is graphs. A graph G consists of a 
set V of vertices, also called nodes, together with a binary relation E on V 
whose members are called edges. The adjacency matrix A = A(G) of a graph 
G — (V, E) is defined for all u, v e V by: 


A[u , v] = 


1 if (m, v ; ) e E 
0 otherwise. 
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If £ is a symmetric relation, then A is a symmetric matrix. In that case, G is an 
undirected graph; otherwise it is directed. 

The degree of a vertex u is the number of edges incident to u, which is 
the same as the number of Is in row u of A. A graph is regular of degree d if 
every vertex has degree d. In that case, consider the matrix A = ^ A Figure 3.1 
exemplifies this for a graph called the four-cycle, C 4 . This graph is bipartite, 
meaning that V can be partitioned into V \, V3 such that every edge connects a 
vertex in V\ and a vertex in Vi- 


Figure 3.1 

Four-cycle graph G = C4, stochastic adjacency matrix A ' ( -, and unitary matrix Uq. 
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Because every row has non-negative numbers summing to 1, A is a stochas¬ 
tic matrix. Because the columns also sum to 1, A is doubly stochastic. 

However, A is not unitary for two reasons. First, the Euclidean norm of each 
row and column is (^) 2 (1 + 1) = i, not 1. Second, not all pairs of distinct 
rows or columns are orthogonal. To meet the first criterion, we multiply by 
instead of j. To meet the second, we can change the entries for the edge 
between nodes 3 and 4 from 1 to ~1, creating the matrix U(,, which is also 
shown in figure 3.1. Then Uq is unitary—in fact, it is the tensor product H (g) X 
of two 2x2 unitary matrices given after definition 3.2. 

This fact is peculiar to the four-cycle. An example of a d-regular graph 
whose adjacency matrix cannot similarly be converted into a unitary matrix 
is the 3-regular prism graph shown in figure 3.2. Rows 1 and 3 have a nonzero 
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dot product in a single column, namely, column 5, so there is no substitution 
of nonzero values for nonzero entries that will make them orthogonal. 

Figure 3.2 

3-regular prism graph G and stochastic adjacency matrix A. 



A = 


1 1 
0 0 
0 0 
1 1 
0 1 
1 0 


0 1 
1 0 
1 1 
0 0 
0 0 
1 1 


0 

1 

0 

1 

1 

0 


However, in chapter 14, we will see a general technique involving a tensor 
product of A(, with another matrix to create a unitary matrix, one representing 
a quantum rather than a classical random walk on the graph G. 

The square of an adjacency matrix has entries given by 

A 2 [i,ji = Y J A[i,k]A[kJl 

k 

The sum counts the number of ways to go from vertex i through some vertex k 
and end up at vertex j. That is, A 2 [i,j\ counts the number of paths of length 2 
from i to j. Likewise, A'[i,j] counts paths of length exactly 3, A 4 [i,j] those of 
length 4, and so on. 

For directed graphs in which each edge goes from one of n sources to one 
of m sinks , the adjacency matrices A n m lend themselves to a path-counting 
composition. Consider 

U — A n flt] A», .mi A/i^.m^ * * * An k ,r- 
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Note how the m,- dimensions match like dominoes. This makes U a legal matrix 
product. The corresponding graph identifies the sinks of the graph G{ repre¬ 
sented by A nl( _ ] m( (here mo = n) with the sources of G{ + \. Then U[i,j ] counts 
the number of paths in the product graph that begin at source node i of G\ and 
end at sink node /' of the last graph Gk+ 1 - 

What links quantum and classical concepts here is that weights can be put on 
the edges of these graphs, even complex-number weights. When these are sub¬ 
stituted for the “1” entries of the adjacency matrices, the value U[i,j] becomes 
a sum of products over the possible paths. In the classical case, these weights 
can be probabilities of individual choices along the paths. In the quantum case, 
they can be complex amplitudes , and some of the products in the sum may can¬ 
cel. Either way, the leading algorithmic structure is that of a sum over paths. 
As a fundamental idea in quantum mechanics, it was advanced particularly by 
Richard Feynman, and yet it needs no more than this much about graphs and 
linear algebra to appreciate. 

This discussion already conveys the basic flavor of quantum operations. The 
matrices are unitary rather than stochastic; the entries are square-roots of prob¬ 
abilities rather than probabilities; negative entries—and later imaginary num¬ 
ber entries—are used to achieve cancellations. After defining feasible compu¬ 
tations, we will examine some important unitary matrices in greater detail. 


3.6 Problems 


3.1. Show that the product of unitary matrices is unitary. 

3.2. If U is a matrix, then what is Ue^l 


3.3. Show that the columns of a unitary matrix are unit vectors. Also show that 
distinct columns are orthogonal. 


3.4. Consider the matrix 

w w 
w —w 

For what real values of w is it a unitary matrix? 


3.5. Consider the matrix U equal to: 


1 

V5 


In 

In 


In 

-In 
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Show that for any N, the matrix U is unitary. 


3.6. For real vectors a and b, show that the inner product can be defined from 
the norm as: 


(a,*) = 1/2 (||« + ft|| 2 -||«|| 2 -||ft|| 2 ). 


3.7. For any complex N x N matrix U, we can uniquely write U — R + iQ. 
where Q and R have real entries. Show that if U is unitary, then so is the 
2N x 2N matrix U' given in block form by 


U' = 



Q 

R 


Thus, by doubling the dimension, we can remove the need for complex-number 
entries. 


3.8. Apply the construction of the last problem to the matrix 



i 0 


This is the second of the so-called Pauli matrices, along with X above and Z 
defined below. 


3.9. Consider the following matrix: 


1 

e m/4 

e-in/4 

1 

1 H - i 1 — i 

V2 

g -!>/ 4 

e in/4 

“ 2 

1 — i 1 “|- i 


What is I/ 2 ? 

3.10. Let T a denote the 2x2 “twist” matrix 

1 0 

0 e ia ’ 

respectively. Show that it is unitary. Also find a complex scalar c such that cT a 
has determinant 1 and write out the resulting matrix, which often also goes 
under the name T a . 

3.11. The following cases of T a for a = n, f, j have special names as shown: 


Z = 


1 

0 


0 

, s = 

1 0 

-1 


0 i 



0 

e ix/4 
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How are they related? Also find an equation relating the Pauli matrices X , /, Z 
along with an appropriate “phase scalar” c as in the last problem. 

3.12. In this and the next problem, consider the following commutators of the 
three matrices of the last problem with the Hadamard matrix H (noting also 
that H* — H 1 = H, i.e., the Hadamard is self-adjoint as a unitary matrix): 

Z' = HZHZ* 

S' = HSHS* 

V = HTHT* 

Show that Z' and some multiple cS! have nonzero entries that are powers of i. 

3.13. Show, however, that no multiple cT' has entries of this form by consid¬ 
ering the mutual angles of its entries. What is the lowest power 2 r such that 
these angles are multiples of tr /2 r l 

3.14. Define a matrix to be balanced if all of its nonzero entries have the same 
magnitude. Of all the 2x2 matrices in problem 3.12, say which are balanced. 
Is the property of being balanced closed under multiplication? 

3.15. Show that the rotation matrix by an angle 6 (in the real plane), 

R (0) = cos(<9 / 2 ) sin ^/ 2 ) 

— sin(f?/2) cos((?/2) 

is unitary. Also, what does R 2 X represent? 

3.16. Show that for every 2x2 unitary matrix U , there are real numbers 
9,a,fi,S such that 

U = e iS T a RgTp. 

Thus, every 2x2 unitary operation can be decomposed into a rotation flanked 
by two twists, multiplied by an arbitrary phase shift by S. Write out the decom¬ 
position for the matrix V in problem 3.9. (It doesn’t matter which definition of 
“7V’ you use from problem 3.10.) 

3.17. Show how to write V as a composition of Hadamard and T matrices. 
(All of these problems point out the special nature of the T-matrix and its 
close relative, V .) Either of these matrices is often called the “tt/ 8 gate”; the 
confusing difference from iz/4 owes to the constant c in problem 3.10. 

3.18. Show that the four Pauli matrices, /, X , /, and Z, form an orthonormal 
basis for the space of 2 x 2 matrices, regarded as a 4-dimensional complex 
Hilbert space. 
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3.19. Define G to be the complete undirected graph on 4 vertices, whose 6 
edges connect every pair (ij) with i ^ j. Convert its adjacency matrix Aq into 
a unitary matrix by making some entries negative and multiplying by an appro¬ 
priate constant. Can you do this in a way that preserves the symmetry on the 
matrix? 

3.20. Show that a simple undirected graph G has a triangle if and only if A 2 © 
A has a nonzero entry, where A — Aq and © here means multiplying entry- 
wise. This means that triangle-detection in an n-vertex graph has complexity no 
higher than that of squaring a matrix, which is equivalent to that of multiplying 
two n x n matrices. 

Surprisingly, the obvious matrix multiplication algorithm and its 0(n 2 ) run¬ 
ning time are far from optimal, and the current best exponent on the “n” for 
matrix multiplication is about 2.372. This certainly improves on the 0(n 3 ) run¬ 
ning time of the simple algorithm that tries every set of three vertices to see if 
it forms a triangle. 


3.7 Summary and Notes 

There are many good linear algebra texts and online materials. If this material 
is new or you took a class on it once and need a refresher, here are some 
suggested places to go: 

• The textbook Elementary Linear Algebra by Kuttler (2012). 

• Linear algebra video lectures by Gilbert Strang which are maintained 
at MITOPENCOURSEWARE: http://ocw.mit.edu/courses/mathematics/18- 
06-linear-algebra-spring-2010/video-lectures/ 

• The textbook Graph Algorithms in the Language of Linear Algebra by Kep- 
ner and Gilbert (2011). 

The famous text by Nielsen and Chuang (2000) includes a full treatment of 
linear algebra needed for quantum computation. Feynman (1982, 1985) wrote 
the first two classic papers on quantum computation. 





Boolean Functions, Quantum Bits, and 
Feasibility 


A Boolean function f is a mapping from {0,1}" to {0,1}'", for some numbers 
n and m. When we define a Boolean function/(xi,... ,x„) — (yi,... ,y m ), we 
think of the Xj as inputs and the yj as outputs. We also regard the x; together 
as a binary string x and similarly write y for the output. When m = 1, there is 
some ambiguity between the output as a string or a single bit because we write 
just “y” not “( y )” in the latter case as well, but the difference does not matter 
in context. When m — 1, you can also think off as a predicate: x satisfies the 
predicate if and only if/(x) = 1. 

Thus, Boolean functions give us all of the following: the basic truth values, 
binary strings, and, as seen in chapter 2, also numbers and other objects. The 
most basic have n = 1 or 2, such as the unary NOT function, and binary AND, 
OR, and XOR. We can also regard the following higher-arity versions as basic: 

• AND: This is the function f(x \,... ,x„) defined as 1 if and only if every 
argument is 1. Thus, 

/(l, 1,1) = 1 and/(l, 0,1,1) = 0. 

• OR: This is the function/(xi,..., x n ) defined as 1 if the number of l’s in 
xi,... ,x„ is non-zero. Thus, 

/(0,1,1) = 1 and/CO, 0,0,0) = 0. 

• XOR: This is the function/(xi,... ,x„) defined as 1 if the number of l’s in 
xi,... ,x„ is odd. Thus, 

/(0,1,1) = 0 and/(l, 1,1,1,1) = 1. 

The latter is true because there are five l’s. 

The binary operations can also be applied on pairs of strings bitwise. For 
instance, if x and y are both Boolean strings of length then x © y is equal to 

z = (*t ®yi,...,x„©y„). 

We could similarly define the bitwise-AND and the bitwise-OR of two equal- 
length binary strings. These are not the same as the above zz-ary operations but 
are instead n applications of binary operations. Each operation connects the z-th 
bit of x with then z-th bit of v, for some i, and they intuitively run “in parallel.” 
The Boolean inner product, which we defined in chapter 2, is computed by 
feeding the bitwise binary AND into the n -ary XOR, that is: 

x • y = XOR(xi Ayi,... ,x„ A y n ). 
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In cases like x © y, we say we have a circuit of n Boolean gates that collec¬ 
tively compute the Boolean function/: {0, l} r —> {0,1}", with r — 2n, defined 
by fix,y) = x © y. Technically, we need to specify whether x and y are given 
sequentially as (x\,,.. ,x n ,y\,... ,y n ) or shuffled as (xi,y\, ..., and this 

matters to how we would draw the circuit as a picture. But either way we have 
a 2n-input function that represents the same function of two n-bit strings. 

The number of gates is identified with the amount of work or effort expended 
by the circuit, and this in turn is regarded as the sequential time for the circuit to 
execute. It does not matter too much whether one counts gates or wires between 
gates. What is critical is that only basic operations can be used, and that they 
can only apply to previously computed values. In the following sketch, the 
NOT of a V b is allowed because a V b has already been computed: 

... (a V b) . —'{a V b)... 

Two Boolean functions that we should not regard as basic are: 

• PRIME: This is the function/(xi,..., x n ) defined as 1 if the Boolean string 
x = xi,... ,x„ represents a number that is a prime number. Recall a prime 
number is a natural number p greater than 1 with only 1 and p as divisors. 

• FACTOR: This is the function f(x\,..., x n , w\,..., w n ) regarded as having 
two integers x and w as arguments—note that we can pad w as well as x by 
leading 0’s. It returns 1 if and only if x has no divisor greater than w, aside 
from x itself. 

The game is, how efficiently can we build a circuit to compute these func¬ 
tions? They are related by PRIME(x) = FACTOR/, 1) for all x. This implies 
that a circuit for FACTOR immediately gives one for solving the predicate 
PRIME because one can simply fix the “w” inputs to be the padded version of 
the number 1. This does not imply the converse relation, however. Although 
both of these functions have been studied for 3,000 years, PRIME was shown 
only a dozen years ago to b e feasible in a sense we describe next, while many 
believe that FACTOR is not feasible at all. Unless you are allowed a quantum 
circuit, that is. 


4.1 Feasible Boolean Functions 

Not all Boolean functions are created equal; some are more complex than oth¬ 
ers. In the above examples, which n-bit function would you like to compute if 
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you had to? I think we all will agree that the easiest is the OR function: just 
glance at the input bits and check whether one is a 1. If you only see 0’s, then 
clearly the OR function is 0. AND is similar. 

Next harder it would seem is the XOR function. The intuitive reason is that 
now you have to count the number of bits, and this count has to be exact. If 
there are 45 inputs that are 1 and you miscount and think there are 44, then 
you will get the wrong value for the function. Indeed, one can argue that n-ary 
XOR is harder than the bitwise-XOR function because each of the n binary 
XOR operations is “local” on its own pair of bits. 

More difficult is the PRIME function. There is no known algorithm that we 
can use and just glance at the bits. Is 

1010101101101011010111110101 

a prime number or not? Of course you first might convert it to a decimal num¬ 
ber: it represents the number 11234143. This still now requires some work to 
see if it has any nontrivial divisors, but it does: 23 and 488441. 

One of the achievements of computer science is that we can define the clas¬ 
sical complexity of a Boolean function. Thus, AND and XOR are computable 
in a linear number of steps, that is, 0(n). Known circuits for PRIME take 
more than linearly many steps, but the time is still polynomial , that is, « 0(1) . 
But there are also Boolean functions that require time exponential in n. Many 
people believe that FACTOR is one of them, but nobody knows for sure. 

To see the issue, consider that any Boolean function can be defined by its 
truth table. Here is the truth table for the exclusive-or function XOR: 


X 

y 

x®y 

0 

0 

0 

0 

l 

1 

l 

0 

1 

l 

i 

0 


Each row of the table tells you what the function, in this case, the exclusive- 
or function, does on the inputs of that row. In general, a Boolean function 
f(x i,... ,x„) is defined by a truth table that has 2 " rows—one for each possible 
input. Thus, if n = 3, there are eight possible rows: 

000,001,010,011,100,101,110, and 111. 

The difficulty is that as the number of inputs grows, the truth table increases 
exponentially in size. 
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Thus, representing Boolean functions by their truth tables is alway possible, 
but is not always feasible. The tables will be large when there are thirty inputs, 
and when there are over 100 , the table would be impossible to write down. 

The final technical concept we need is having not just a single Boolean func¬ 
tion but rather a family [f n ] of Boolean functions, each f, taking n inputs, 
that are conceptually related. That is, the \f n \ constitute a single function / 
on strings of all lengths, so we write/: { 0 , 1 }* —> { 0 , 1 }, or for general rather 
than one-bit outputs,/: {0,1}* —> {0,1}*. Maybe it is confusing to write “/” 
also for this kind of function with an infinite domain, but the intent is usually 
transparent—as when letting AND, OR, and XOR above apply to any n. Now 
we can finally define “feasible”: 

Definition 4.1 A Boolean function / = [/„] is feasible provided the indi¬ 
vidual/,, are computed by circuits of size n 0< - l \ 


4.2 An Example 

Consider the Boolean function MAJ(xi ,X 2 ,X 3 ,X 4 ,X 5 ), which takes the major¬ 
ity of five Boolean inputs. A first idea is to compute it using applications 
of OR and AND as follows: For every three-element subset S — {i,j,k } of 
{1,2,3,4,5}, we compute ys = OR(x;,Xy,xjt). Define y to be the AND of each 
of the ten subsets S. Then y — 1 <=> no more than 2 bits of xi,..., X 5 are 0 
<=> MAJ (x'i, X 2 , X 3 , x' 4 , x' 5 ) is true. The complexity is counted as 11 operations 
and, importantly, 35 total arguments of those operations. 

On second thought, we can find a program of slightly lower complexity. 
Consider the Boolean circuit diagram in figure 4.1. 

Expressed as a sequence of operations, in one of many possible orders, the 
circuit is equivalent to the following straight-line program: 

Vl = 0R(X'1,X2,X3), V2 = 0R(X4,X'5), 

wi = AND(xi,x 2 ), vv’2 = AND(xi,X 3), W3 = AND(x2,X3), 

W 4 = AND(h’i,w 2 ), W 5 = AND(vi ,X 4 ,X 5 ) 
u = OR(vvi,W 2 ,W 3 ), t — AND(m, V 2 ), y = OR(w 4 , f, W 5 ). 


This program has 10 operations and only 24 applications to arguments. To see 
that it is correct, note that W4 is true if and only if xi = X2 = X3 = 1, and W5 is 
true if and only if X4 = X5 = 1 and one of x3.x2.x3 is 1. Finally, t is true if and 
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Figure 4.1 

Monotone circuit computing MAJ(xi,X2>X3>X4,X5)' 


X l X 2 X 3 X 4 X 5 



only if two of xi,X 2 ,X 3 and one of X 4 ,X 5 are tme, which handles the remaining 
six true cases. 

Now clearly MAJ generalizes into a Boolean function MAJ of strings x of 
any length n, such that MAJ(x) returns 1 if more than n/2 of the bits of x are 
1. We ask the important question: 


Is MAJ feasible? 


The operational question about MAJ(xi,X2 >*3>*4>*5) is, do the above pro¬ 
grams scale when “5” is replaced by “n”? Technically, scalable is the same 
idea as feasible but with the idea mentioned in chapter 2 that a polynomial 
bound is the same as having a linear bound each time the size of the input 
doubles. 

The first idea, when generalized from “5” to “n,” says to take the AND 
of every r-sized subset of [n], where r — |_n/2J + 1, and feed that to an OR. 
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Flowever, there are (") such subsets, which is exponential when r ~ n/2. So 
the first idea definitely does not scale. 

The trouble with the second, shorter program is its being rather ad hoc for 
n = 5. The question of whether there are programs like it with only AND and 
OR gates that scale for all n is a famous historical problem in complexity the¬ 
ory. The answer is known to be yes, but no convenient recipe for constructing 
the programs for each n is known, and their size 0 (n 5 i ) is comparatively high. 

However, if we think in terms of numbers, we can build circuits that easily 
scale. Take k — |"log 2 (n + 1)]. We can hold the total count of l’s in an x of 
length n in a k-bit register. So let us mentally draw x going down rather than 
across and draw to its right an n x k grid, whose rows will represent successive 
values of this register. It is nicest to put the least significant bit leftmost, i.e., 
closest to x, but this is not critical. The row to the right of x\ has value 0 (that 
is, 0 k as a Boolean string) if x\ — 0 and value 1 (that is, l(r -1 ) if x\ = 1. As 
we scan x downward, if x, = 0, then the row of k bits has the same value as the 
previous one, but if x; = 1, then the register is incremented by 1. We might not 
regard the increment operation as basic —instead, we might use extra “helper 
bits” and gates to compute binary addition with carries. But we can certainly 
tell that we will get a Boolean circuit of size 0(kn ) = 0(n log;;), which is 
certainly feasible. At the end, we need only compare the final value v with 
n/2, and this is also feasible. 

This idea broadens to any computation that we can imagine doing with paper 
and pencil and some kind of grid, such as multiplication, long division, and 
other arithmetic. To avoid the scratchwork and carries impinging on our basic 
grid, we can insist that they occupy /i-many “helper rows” below the row with 
x n , stretching wires down to those rows as needed. The final idea that helps 
in progressing to quantum computation is instead of saying we have k(n + h) 
bits, say that we have n + h bits that “evolve” going left to right. This also fits 
the classical picture articulated by Turing. The cells directly below x„ in the 
column with the input can be regarded as a segment of “tape” for scratchwork. 
The change to this tape at each step of a Turing Machine computation on 
x, including changes to the scratchwork, can be recorded in an adjacent new 
column. If the machine computation lasts t steps, then with s = n + h as the 
measure of “space,” we have ansxf grid for the whole computation. 

To finish touring machine and circuit models, we can next imagine that every 
cell in this grid depends on its neighbors in the preceding column by a fixed 
finite combination of basic Boolean operations. This gives us a circuit of size 
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0(st) = 0(t 2 ) because we do at most one bit of scratchwork at each step. If the 
machine time t(n) is polynomial, then so is t(n) 2 . 

This relation also goes in the opposite direction if you think of using a 
machine to verify the computation by the circuit—provided the circuits are 
uniform in a technical sense that captures the conceptual sense we applied 
above to Boolean function families. 

Hence, the criterion of feasible is broad and is the same for any of the classi¬ 
cal models of computation by machines, programs, or (uniform) circuits. There 
is a huge literature on which functions are feasible and which are not. One 
can encode anything via Boolean strings, including circuits themselves. The 
problem of whether a Boolean circuit can ever output 1—even when it allows 
applying NOT only to the original arguments Xj and then has just one level of 
ternary OR gates feeding into a big AND—is not known to be feasible. That is, 
no feasible function is known to give the correct answer for every encoding X 
of such a program: this is called the satisfiability problem and has a property 
called NP-hardness. We define this with regard to an equivalent problem about 
solving equations in chapter 16. For now we are happy with not only defining 
classical feasible computation in detail but also showing that equivalent cri¬ 
teria are reached from different models. Now we are ready for the quantum 
challenge to this standard. 


4.3 Quantum Representation of Boolean Arguments 

Let N = 2". Every coordinate in /V-dimensional Hilbert space corresponds to 
a binary string of length n. The standard encoding scheme assigns to each 
index j e [0 ,... ,n — 1] the n-bit binary string that denotes j in binary nota¬ 
tion, with leading 0’s if necessary. This produces the standard lexicographic 
ordering on strings. For instance, with n — 2 and /V = 4, we show the indexing 
applied to a permutation matrix: 



00 

01 

10 

11 

00 

1 

0 

0 

0 

01 

0 

1 

0 

0 

10 

0 

0 

0 

1 

11 

0 

0 

1 

0 
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The mapping is/(00) = 00, /(01) = 01,/(10) = 11,/(11) = 10, and in gen- 
eral f(x\,xi) = (x\,x\ © xi )• Thus, the operator writes the XOR into the sec¬ 
ond bit while leaving the first the same. Once can also say that it negates the 
second bit if-and-only-if the first bit is 1. This negation itself is represented on 
one bit by a matrix we have seen before—now with the indexing scheme, it is: 



0 1 

0 

0 1 

1 

1 0 


Thus, the negation is controlled by the first bit, which explains the name 
“Controlled-NOT” f CNOT ) for the whole 4x4 operation. 

To get a general Boolean function y =f(x \,... ,x n ), we need n+ 1 Boolean 
coordinates, which entails 2N — 2" +1 matrix coordinates. What we really com¬ 
pute is the function 


F(x i,... ,x n ,z) = (xi,... ,x n ,z ®/(xi,... ,x n )). 

Formally, F is a Boolean function with outputs in {0,1}' ,+ 1 rather than just 
{0,1}. Its first virtue, which is necessary to the underlying quantum physics, is 
that it is invertible—in fact, F is its own inverse: 

F(F(xi,...,x n ,z)) = F{x\,...,x n ,z®y) 

= (xi,... ,x„, (z © y) ® y) = (x \,... ,x n , z). 

Its second virtue is having a 2N x 2N permutation matrix Pf that is easy to 
describe: the lone 1 in each row x\X 2 ■ ■ • x n z is in column x\X 2 ■ ■ -x n b, where 
b = z®f(xi,...,x„). 

If / is a Boolean function with m outputs (y i,... ,y m ) rather than a single 
bit, then we have the same idea with 

F(xi,...,X n ,Zl,...,Zm) = (xi,...,X n ,Zl ®yt ,...,Zm®ym) 

instead. The matrix Pf is still a permutation matrix, although of even larger 
dimensions 2" +m x 2" + '". Often left unsaid is what happens if we need /z-many 
“helper bits” to compute the original/. The simple answer is that we can treat 
them all as extra outputs of the function, allocating extra Zj variables as dummy 
inputs so that the ® trick preserves invertibility. Because h is generally poly¬ 
nomial in ?z, this does not upset feasibility. 

In this scheme, adopting the “rows-and-columns” picture we gave for classi¬ 
cal computation above, everything is laid out in n' — n + m + h rows, with the 
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input x laid out in the first column. Each row is said to represent a qubit, which 
is short for quantum bit. In order to distinguish the row from the idea of a qubit 
as a physically observable object, we often prefer to say qubit line for the row 
itself in the circuit. The /t-many helper rows even have their own fancy name 
as ancilla qubits, using the Latin word for “chambermaid” or, more simply, 
helper. 

Writing out a big 2" x 2" matrix, just for a permutation, is of course not 
feasible. This is a chief reason we prefer to think of operators Pf as pieces 
of code. The qubit lines are really coordinates of binary strings that represent 
indices to these programs. These strings have size n', and their own indices 
1 are what we call quantum coordinates, when trying to be more care¬ 
ful than saying “qubits.” As long as we confine ourselves to linear algebra 
operations that are efficiently expressible via these n' quantum indices, we can 
hope to keep things feasible. The rest of the game with quantum computation 
is, which operations are feasible? 


4.4 Quantum Feasibility 

A quantum algorithm applies a series of unitary matrices to its start vector. Can 
we apply any unitary matrix we wish? The answer is no, of course not. If the 
quantum algorithms are to be efficient, then there must be a restriction on the 
matrices allowed. 

If we look at the matrices Pf in section 4.3, we see several issues. First, the 
design of Pf seems to take no heed of the complexity of the Boolean func¬ 
tion/ but merely creates a permutation out of its exponential-sized truth table. 
Because infeasible (families of) Boolean functions exist, there is no way this 
alone could scale. Second, even for simple functions like AND(xi ,X 2 ,... ,x„), 
the matrix still has to be huge—even larger than 2" on the side. How do we 
distinguish “basic feasible operations”? Third, what do we use for variables? 
If we have a 2"-sized vector, do we need exponentially many variables? 

The answer is to note that if we keep the number k of arguments for any 
operation to a constant, then 2 k stays constant. We can therefore use 2 k x 2 k 
matrices that apply to just a few arguments. But what are the arguments? They 
are not the same as the Hilbert space coordinates 0,..., N — 1, which would 
involve us in exponentially many. The quantum coordinates start off being 
labeled x\ ,X2, ..., x„ as for Boolean input strings and extend to places for out¬ 
puts and for ancillae, which is the plural of ancilla. 
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With these differences understood, the notion of feasible for unitary matrices 
is the natural extension of the one for Boolean circuits. Any unitary matrix B 
of dimension 2 k where k is constant—indeed, we will have k < 3—is feasible. 
Such a matrix is allowed to operate on any subset of k quantum coordinates, 
provided it leaves the other n' — k coordinates alone. A tensor product of B 
with identity matrices on the other quantum coordinates is a basic matrix. We 
could require that the entires of B be simple in some way, but it will suffice to 
take B from a small fixed finite family of gates. 

Now suppose that U is any unitary matrix of dimension N. Then we will say 
that it is feasible provided there is a way to construct it easily out of basic matri¬ 
ces. Technically, we mean that U belongs to an infinite family | U n ) parameter¬ 
ized by n, with each U n constructible from n 0{ 11 basic matrices in a uniform 
manner. This stipulation is asymptotic, but the intent is concrete. To show con¬ 
creteness, one can express U via a quantum circuit of basic gate matrices. 

We will stay informal with quantum circuits as we did for Boolean circuits 
while formalizing quantum computations in terms of matrices. Rather than 
grids with squares as we have described for Boolean circuits, quantum circuits 
use lines that go across like staves of music and place gates on the lines like 
musical notes and chords. The first n lines correspond to the inputs x\,... , x n , 
while all other qubit lines are conventionally initialized to 0. The only “cross¬ 
ing wires” are parts of multi-ary gates, either running invisibly inside boxes or 
shown explicitly for some gates like the CNOT operation above. 

Here is a circuit composed of one Hadamard gate on qubit line 1, followed 
by a CNOT with its control on line 1 and its target on line 2: 


*1 



LJ 




11 


\ 


y i 


*2 


€>-F2 


Underneath the Hadamard gate is an invisible identity gate, expressing that in 
the first time step, the second qubit does not change. We could draw this into 
the circuit if we wish: 



H 




1 

f 


vi 


*2 -{ 7 }- 


F2 


Whenever two gates can be placed vertically this way, a tensor product is 
involved. Thus, the matrix form of the computation is the composition V 2 ■ Vi 
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of the 4x4 matrices 


V 2 = CNOT = 


"1 

0 

0 

0 " 


"1 

0 

1 

0 " 

0 

1 

0 

0 


0 

1 

0 

1 

0 

0 

0 

1 

V2 

1 

0 

-1 

0 

_0 

0 

1 

0 _ 


_0 

1 

0 

-1_ 


To show their product U — V 2 V-\ acting on the input vector x 
we obtain 



"10 1 0” 


A 


A 

1 

0 10 1 


0 

1 

0 

V5 

oio-i 


0 

_ A 

0 


_1 o -i 0_ 


w 


w 


[l,0,0,0] r . 


(4.1) 


Expressed in quantum coordinates, the input vector denotes the input string 00, 
that is, X] = 0 and X 2 = 0. The output vector is not a basis vector. Rather, it is 
an equal-weighted sum of the basis vector for 00 and the basis vector for 11. 
This means we do not have a unique output string y in quantum coordinates, 
although we have a simple output vector v in the N = 2 2 = 4 Hilbert-space 
coordinates. Much of the power of quantum algorithms will come from such 
outputs v, from which we need further interaction in the form of measurements , 
even repeatedly, to arrive at a final Boolean output y. Thus, we hold off saying 
what it means for a Boolean function to be quantum feasibly computable, but 
we have enough to define formally what it means for a quantum computation 
to be feasible: 


Definition 4.2 A quantum computation C on s qubits is feasible provided 


C=U t U,^ U 1, 


where each l/, is a feasible operation, and ,v and t are bounded by a polynomial 
in the designated number n of input qubits. 

Which quantum gates B are basic, and which possibly other operations V on 
the whole space are feasible? We will not try to give a comprehensive answer 
to this question, but in the next chapter, we give some more gates that everyone 
agrees are basic and some operations that almost everyone agrees are feasible. 
They are building blocks in the same way that the Boolean operations NOT, 
AND, OR, and XOR are. It is believed—certainly hoped—that such matrices 
will one day be constructible. Moreover, quantum computers may prove to 
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be makable out of replicable parts the way classical computers—such as your 
laptop—are built today. 


4.5 Problems 

4.1. Show that XOR cannot be written using only the composition of AND and 
OR functions. 

4.2. Suppose that/(x) and g(x) are Boolean functions on n inputs. Let 

h(x) =f(x) © g(x). 

Prove that h is always zero if and only if/ and g are the same function. 

4.3. For a Boolean string x = x\,... ,x n define 

(-If 


to be 

-Kc„) 

Show that (-If is equal to 1 if and only if XOR(x) = 0. 

4.4. Let 

X = X\,... ,x n , 
and 

y = yi,...,y n , 

be Boolean strings. Prove that 

(-lf®>' = (-lf x (-If. 


4.5. Does (“If my always equal (-1 )■'*-' ? 

4.6. Show that there are an uncountable number of unitary matrices of dimen¬ 
sion N. Does this help explain why we cannot allow any unitary matrix? 

4.7. Show that x\ © • • • © x n can be formed from 0{n) binary Boolean opera¬ 
tions. 
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4.8. What effect does the following 4x4 matrix have on the quantum 
coordinates—that is, on the four basis vectors for the strings 00, 01, 10, and 
11 ? 

"1 0 0 0 " 

0 0 10 
0 10 0 
_0 0 0 1 _ 

We include it as a basic gate. 


4.9. Now show how to use the swap gate of problem 4.8 to obtain the following 
action on the standard basis vectors: the string x\X 2 ■ ■ ■ x n becomes xa ■ ■ -x n x \. 
Call this the cycle K n . How many swap gates did you need? 


4.10. Consider a quantum circuit with three qubit lines. We can draw a CNOT 
gate whose control is separated from its target —indeed, we can also place it 
“upside down” as: 


x\ 


X2 


X3 


- 6 

3 - 




yi 

y2 

y3 


We have omitted showing an identity gate on the second qubit line, which 
would look ugly when crossed by the CNOT gate’s vertical wire anyway. 
Write out the 8x8 matrix of the operation represented by this (piece of a) 
quantum circuit. 

4.11. Show that the matrix in problem 4.10 cannot be written as a tensor prod¬ 
uct of two smaller matrices, in particular not of some permutation of CNOT 
and the 2x2 identity matrix. This seems to violate our definition of a basic 
matrix V-, in a feasible quantum computation, but see the next problem. 

4.12. Show nevertheless that the circuit in problem 4.10 can be simulated by 
three quantum time-steps, each a tensor product of the 2x2 identity and a 
basic 4x4 matrix. Argue generally that /c-qubit gates can be oriented in any 
way desired on any k qubit lines without upsetting definition 4.2. Hint: Use 
problem 4.8. 

4.13. Consider the general construction in section 4.3 of a unitary permutation 
matrix Pt from a Boolean function f. For what function f does CNOT equal 

P/? 
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4.14. Determine the 8x8 matrix of Pf where / is the binary AND function. 
(You do not have to write every 0.) This gate, which we write as TOF, is named 
for Tommaso Toffoli. 


4.6 Summary and Notes 

This chapter has presented the basic ingredients of Boolean complexity and 
quantum complexity side by side for comparison. In both cases, there is a 
common notion of feasible associated with complexity cost measures being 
bounded by some polynomial in the size of the data. We have presented the 
Boolean circuit model in both of its equivalent formulation via circuits and 
straight-line programs, and while we support viewing quantum computations 
as circuits, we defined them as programs giving compositions of basic matrix 
operations. 

There are many books on Boolean complexity—check one out if you need 
more background here. General textbooks on computation theory also include 
concepts such as machine models, decision problems, and (un)computability. 
Among them we suggest the texts by Sipser (2012) and Homer and Selman 
(2011). The second author has co-written three book chapters on the basics 
of complexity theory (Allender et ah, 2009a,b,c). The first of these chapters 
includes a diagram of the simulation of Turing machines by Boolean circuits 
(with st size overhead) in the form of Savage (1972); an 0(t log ,v)-size simula¬ 
tion was proved by Pippenger and Fischer (1979), but these circuits do not have 
the same degree of spatial locality. The theorem about monotone programs for 
majority was proved by Valiant (1984). That quantum operations can simulate 
Turing machines was first observed by Benioff (1982). 

The CNOT gate and some other quantum gates go all the way back to Feyn¬ 
man (1982, 1985) and Deutsch (1985, 1989), while Yao (1993) systematized 
quantum circuit theory, and Barenco et al. (1995) gave an influential roundup 
of basic gates. Universality results about small gate sets followed (Barenco 
et ah, 1995, DiVincenzo, 1995, Lloyd, 1995). The Toffoli gate comes from 
Toffoli (1980) and Fredkin and Toffoli (1982). 
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Special Matrices 


Given our view that quantum algorithms are simply the result of applying a 
unitary transformation to a unit vector, it should come as no surprise that we 
need to study unitary matrices. Happily, there are just a few families of such 
matrices that are used in most quantum algorithms. We will present those in 
this chapter. 

Two of the families correspond to transforms that are well studied through 
mathematics and computer science theory and have many applications in many 
areas besides quantum algorithms. When is a transformation a transform ? The 
latter term connotes that the output is a new way of interpreting the input. 
Because all quantum transformations are invertible, this is in a sense always 
true, but the intuition is highest for the families presented here. 


5.1 Hadamard Matrices 


The first family of unitary transforms are the famous Hadamard matrices. Note 
that because we mainly stay with the standard basis of vectors, we will 
identify transforms with their matrices, and this should cause no confusion. 
Here we lock in our convention that N is always 2" for some n. 

Definition 5.1 The Hadamard matrix H\ of order N is recursively defined 
by H 2 — H and for N > 4: 


H n = H n/ 2 <g> H — 


1 

V5 


Hn /2 
Hn/ 2 


Hn/ 2 
—Hn/2 


We could also use H 1 = [1] as the basis. If we wish to use n not N as a marker, 
then we write H 9j " using a superscript instead of a subscript. 


This recursive definition implies many important facts about this matrix. For 
example, it easily implies that, in general, Hn is equal to —A where A is a 
matrix of ± 1 only. However, it is often much more useful to have the following 
direct definition of the entries of Hn- 


Lemma 5.2 For any row r and column c. 


H N [r,c] = (~ir c , 


recalling that r • c is the inner product of r and c treated as Boolean strings. 

□ 
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Thus, for any vector a , the vector b = H^a is defined by 

N -1 

b to = -7= xw'««. 

^ r=0 

This is the way that we will view the transform in the analysis of most algo¬ 
rithms. Note the convenience of using Boolean strings as index arguments. 

In a quantum circuit with n qubit lines, H\ is shown as a column of n- many 
single-qubit Hadamard gates. This picture frees one from having to think of 
tensor products in the design of a circuit but does not further our analysis. 


5.2 Fourier Matrices 

The next important family consists of the quantum Fourier matrices. Let co 
stand for e 2n ^ N , which is often called “the” principal /V-th root of unity. 

Definition 5.3 The Fourier matrix F,v of order N is: 



"i 

l 

1 

1 

1 


l 

CO 

co 2 

co 3 

••• co N ~ l 

1 

l 

co 2 

co 4 

CO 6 

••• co N ~ 2 
••• co N ~ 3 

Vn 

l 

CO 3 

CO 6 

w 9 


l 

co N ~ 1 

co N ~ 2 

co N ~ 3 

CO 


That is, F N [i,j] = afl mod N . 

It is well known that F,y is a unitary matrix over the complex Hilbert space. 
This and further facts about F.y are set as exercises at the end of this chapter, 
including a running theme about its feasibility via various decompositions. For 
any vector a, the vector b — F^a is defined in our index notation by: 

N-\ 

b(x) = —— 'y' co x, a(t). 

This is the way that we will view the transform in our algorithmic analysis. 
That this is tantalizingly close to the equation for the Hadamard transform 
was significant to Peter Shor in his step from Daniel Simon’s algorithm (chap¬ 
ter 10) to his own (chapter 11). The differences are having co in place of -1 
and multiplication xt in place of the Boolean inner product x • t. 
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5.3 Reversible Computation and Permutation Matrices 

Every N x N permutation matrix is unitary. However, in terms of n with 
N — 2", there are doubly exponentially many permutation matrices. Hence, 
not all of them can be feasible—indeed, most of them are concretely infeasi¬ 
ble. Which permutation matrices are feasible? 

We can give a partial answer. Recall the definition of the permutation matrix 
Pf from the invertible extension F of a Boolean function/ in section 4.3. 

THEOREM 5.4 All classically feasible Boolean functions / have feasible 
quantum computations in the form of Pf. 

The proof of this theorem stays entirely classical—that is, the quantum cir¬ 
cuits are the same as Boolean circuits that are reversible, which in turn effi¬ 
ciently embed any given Boolean circuit computing/. We need only one new 
gate, which was already mentioned in problem 4.14. 

Definition 5.5 The Toffoli gate is the ternary Boolean function 


TOF(x i ,X2,xf) = (X 1 .X 2 .X 3 ® (x\ A * 2 )). 

The Toffoli gate induces the permutation in 8 -dimensional Hilbert space 
that swaps the last two entries, which correspond to the strings 110 and 111 , 
and leaves the rest the same. This extends the idea of CNOT with xi ,X 2 as 
“controls” and X 3 as the “target.” That this simple swap is universal for Boolean 
computation is conveyed by the following two facts for Boolean bit arguments 
a.b: 

• NOT(a) = TOF( 1,1, a)- 

• AND(fl/) = TOF(a,b,0). 

Proof of Theorem 5.4. Because AND and NOT is a universal set of logic gates, 
we may start with a Boolean circuit C computing f(x\,... ,x n ) using r-many 
NOT and ,v-many binary AND gates. The NOT gates we can leave alone 
because we already have the corresponding 2x2 matrix X as a basic quan¬ 
tum operation. Hence, we need only handle the s-many AND gates. We can 
simulate them by ,v-many Toffoli gates each with an ancilla line set to 0 for 
input, but this is superseded by the issue of possibly needing multiple copies 
of the result c of an AND gate on lines a, b —that is, one for each wire out of 
the gate. 
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This is where the Toffoli gate shines. For each output wire w, we allocate 
a fresh ancilla z and put a Toffoli gate with target on line z and controls on a 
and b. This automatically computes z ® (a A b ), which with z initialized to 0 is 
what we want. Multiple Toffoli gates with the same controls do not affect each 
other. Hence, the overhead is bounded by the number of wires in C, which is 
polynomial, and the only ancilla lines we need already obey the convention of 
being initialized to 0. □ 

There are versions of theorem 5.4, some applying to Turing machines and 
other starting models of computation, that have much less overhead, but “poly¬ 
nomial” is good enough for our present discussion of feasibility. Thus, a per¬ 
mutation matrix—which is a deterministic quantum operation—is feasible if it 
is induced by a classical feasible function on the quantum coordinates. 


5.4 Feasible Diagonal Matrices 


Any diagonal matrix whose entries have absolute value 1 is unitary. Hence, 
it can be a quantum operation. The question is, which of these operations are 
feasible ? 

Of course if the size of the matrix is a small fixed number, we can call it basic 
and hence feasible. What happens when the matrices are N x N, however? 
Even if we limit to entries 1 and -1, we have one such matrix Us for every 
subset S of [ N ], that is, S C {0,1}": 


U s [x,x ] 


11 a e S\ 
otherwise. 


Because there are doubly exponentially many S, there are doubly exponentially 
many Us, so most of them are not feasible. But can we tell which are feasible? 
Again we give a partial answer. When S is the set of arguments that make a 
Boolean function/ true, we write Uf in place of Us The matrix Uf is called 
the Grover oracle for/. 

THEOREM 5.6 If/ is a feasible Boolean function, then its Grover oracle Uf 
is feasible. 


We defer the proof until section 6.5 in the next chapter. As with theorem 5.4, 
the question of whether any other families of functions / make Uf meet our 
quantum definition of feasible is a deep one whose answer is long unknown 
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and is related to issues in chapter 16. We can be satisfied for now that we have 
a rich vocabulary of feasible operations, and the next chapter will give some 
tricks for combining them. Here we give one more family of operations. 


5.5 Reflections 


Given any unit vector a, we can create the unitary operator Ref a , which reflects 
any other unit vector b around a. Geometrically, this is done by dropping a line 
from the tip of b that hits the body of a in a right angle and continuing the line 
the same distance further to a point b'. Then b' likewise lies on the unit sphere 
of the Hilbert space. The operation mapping b to b' preserves the unit sphere 
and is its own inverse, so it is unitary. 

In geometrical terms, the point on the body of a is the projection of b onto 
a and is given by a' — a{a,b). Thus, 

b' —b — 2 (b — a{a,b)) — (2 P a — l)b, 
where P a is the operator doing the projection: for all b , 


P a b — a{a,b). 


For example, let a be the unit vector with entries which we call j. Then the 

projector is the matrix whose entries are all i, which we call J in our matrix 
font. Finally, the reflection operator is 


V = 2 J - I = 



2 

N 


2 

N 


2 

N 



2 

N 


We claim this matrix is feasible. Of course this leads to the question: which 
reflection operations are feasible? 

An important case of reflection is when a is the characteristic vector of a 
nonempty set .S', that is: 


a(x) = 


l 

VIS! 

0 


if x e S', 
otherwise. 


Suppose we apply Ref a to vectors b with the foreknowledge that all entries 
e — b(x) for.v e S are equal. Let k — |5j. Then we have (a, b) — ke/Vk — e\fk. 
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and taking the projection a' = P a b, we have 


a' (x) 


■ 

0 


if x e 5; 
otherwise. 


The reflection b' = 2a' — ft thus satisfies 


b'(x) 


b{x) ifxeS; 
—b(x) otherwise. 


because in the case x e .S’, ft'(x) = 2e — b{x) — 2e — e — e — b(x). Then the 
action is the same as multiplying by the diagonal matrix that has ~1 for the 
coordinates that are not in S, that is, by the Grover oracle for the comple¬ 
ment of 5. Because the negation of a feasible Boolean function is feasible, this 
together with the case of V implies: 


Theorem 5.7 For all feasible Boolean functions/, provided we restrict to 
the linear subspace of argument vectors whose entries indexed by the “true set” 
Sf off are equal, reflection about the characteristic vector of Sf is a feasible 
quantum operation. □ 


Happily, the set of such argument vectors forms a linear subspace and 
always contains the vector j, which we will use as a “start” vector. Moreover, 
reflections by a and ft, when applied to vectors already in the linear subspace 
spanned by a and ft, stay within that subspace. We will use this when pre¬ 
senting Grover’s algorithm and search by quantum random walks, but that is 
getting ahead of our story. 


5.6 Problems 

5.1. What is H 4 ? 

5.2. Prove that ft/\ is a unitary matrix. 

5.3. Prove lemma 5.2. 

5.4. Prove that F,y is a unitary matrix. Note that because it is a complex valued 
matrix, this means that it times its complex transpose is the identity matrix. 

5.5. Let D n be the diagonal matrix formed by the second column of F,v. Show 
how to write Dn as a tensor product of the 2x2 twist matrices T„ defined in 
problem 3.10. 
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5.6. Recalling also the cycle operation K n from problem 4.9, show that F,y 
obeys the following recursive equation in block matrices: 


1 

> (n - 1} d n/2 ' 


1 

o 

Cl 

I_ 

V2 

l® (n ~ l) -D N/ 2 


-1 

Cl 

u: 

o 

_ [ 


This ultimately shows how to decompose F,.y into swaps, controlled twists, and 
the Hadamard gates at the base of this recursion. 

5.7. Let/: {0,1}" -4 {0,1}" be a Boolean function. Show that the following 
function is always invertible: 

g(x,y) = (x,y 0/'(x)). 


5.8. Prove that the following sum is zero: 

N -1 

k= 0 

where co is e 2n ^ N . For what values of t is the following sum equal to zero: 

N-l 

z« w? 

k=0 


5.9. Show that the product of two permutation matrices is again a permutation 
matrix. Also show that a permutation matrix is unitary. 

5.10. A Fredkin gate , named for Edward Fredkin, swaps 101 and 110 while 
leaving the other six arguments fixed. Show that, like the Toffoli gate, it is 
universal for reversible computation. 

5.11. Show by direct calculation that the reflection matrix 2J — I is unitary. 

5.12. Our use of the term characteristic vector in section 5.5 may appear to 
clash with the standard term in linear algebra, which we prefer to call an eigen¬ 
vector. Show, however, that the meanings do harmonize, namely, the charac¬ 
teristic vector is an eigenvector of some relevant quantum operations. 

For the following problems, define two vectors to be dyadically orthogonal 
if their entry-wise products cancel in pairs. That is, two vectors a. b in C", 
where n is necessarily even, are dyadically orthogonal if [n] can be partitioned 
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into two-element subsets {i,j} such that a(i)b(i ) = —a(j)b(j). Of course two 
dyadically orthogonal vectors are orthogonal. Here are some pairs of dyadi- 
cally orthogonal real and complex vectors: 

[1111] [9 4 6 -15] [1 i 1 1] 

[11-1-1]’ [5 -3 2 3 ] [ill -1] ' 

Next, call a matrix dyadically unitary if it is unitary and, in addition, every 
two distinct rows are dyadically orthogonal and likewise every two distinct 
columns. Every 2x2 unitary matrix is dyadically unitary, so the concept 
becomes distinctive starting with 4x4 matrices. 

5.13. Show that every Hadamard matrix H\ is dyadically unitary. 

5.14. Show that the quantum Fourier transform matrices F,y are dyadically 
unitary. 

5.15. Show that the tensor product of a dyadically unitary matrix with any 
unitary matrix is dyadically unitary. 

5.16. Show that dyadic unitarity is (alas) not closed under composition, that 
is, under matrix product. In particular, find a 4 x 4 dyadically unitary matrix 
A such that A 2 , while necessarily unitary, is not dyadically so. (Hint: First do 
problem 3.19 in chapter 3.) 

The following exercises give more understanding of quantum coordinates 
and embody a research avenue we two authors began toward deeper analysis 
of the quantum Fourier transform. We generalize the notion of substring for any 
subset I of [1 ,...,«}, I — {ii,h, ..., i r } in order, by putting xj — • • -x, r , 

for any x e [0,1}". For any such I and binary string w of length r, define 

Si.w = [x | XI = w }. 

Under the standard binary order of complex indices, we regard .S’/, vv also as a 

subset of [0. N- 1} (where N = 2" as usual) and call 5/ ;W a cylinder. Sets 

of the form Sj = 57,0', where r — |/|, are principal cylinders. 

5.17. Show that for any r, the first R = 2 r complex indices form a principal 
cylinder. 

5.18. Write out the members of the cylinder for n — 5 ,1 — [2,3,5], and w — 
101. How can we recognize the numbers in binary notation? 
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Now define an N x N matrix M to be unitarily decomposable along a set S 
of rows, where R — |.S’| divides N, if the columns of M can be partitioned into 
N/R-many subsets 7 'q, ..., 7)v/x- i , each of size R , such that for all j < N/R , the 
R x R sub-matrix M[S, Tj] (which is formed by the rows in S and the columns 
in Tj) is unitary up to a scalar multiple. The “blocks” 77 need not consist of 
consecutive columns. 

5.19. Show that for any principal cylinder S , the Hadamard matrices 77 \ are 
unitarily decomposable along S , where the sub-matrices are also Hadamard 
matrices up to factors of ~J2. Is this tme of any cylinder? Find a set of four 
rows in H% that have no unitary decomposition. 

5.20. Show that for any principal cylinder S , the quantum Fourier transform 
F,y is unitarily decomposable along S. 

Hint: The proof we know works by induction on N — 2", dividing into cases 
according to whether the nth qubit belongs to 7. For this we have found it 
convenient to prove and maintain the stronger inductive hypothesis that every 
submatrix U/ = M\S , 77] is dyadically unitary and, moreover, that they are all 
related by powers of a unitary diagonal matrix D on the left. That is, Uj = 
DP^Uo, where however the powers p(j) need not be consecutive integers. 


5.7 Summary and Notes 

This chapter has presented the most important operations for quantum compu¬ 
tation. It has given us a vocabulary of feasible quantum operations. In particu¬ 
lar: 

1. The special matrices Hadamard and Fourier are all feasible. 

2. Permutation matrices allowed by the reversibility theorem are feasible pro¬ 
vided the corresponding Boolean function is classically feasible. 

3. The Grover oracle of a classically feasible Boolean function is feasible. 

A curious piece of history is that the family we call the Hadamard matrices 
were not discovered by Jacques Hadamard but by Joseph Sylvester. The trans¬ 
form is also named for Joseph Walsh and/or Hans Rademacher. Mathematical 
history can be complex. 

The important results on reversible computation are due to Bennett (1973) 
and Lecerf (1963), who discovered them independently years before quantum 
algorithms were even envisioned. Their motivation was to study the limits of 
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reversible computations for their own sake. At one time, it was thought that 
computations had to destroy information: for example, every assignment oper¬ 
ation destroys the previous contains of a memory location and so causes a loss 
of information. Now we know that any computation can be made reversible, 
the destruction of information is not required for computation, and, even better, 
reversibility does not greatly increase computational cost. The Fredkin gate is 
from Fredkin and Toffoli (1982). 

Dyadic unitarity is an original concept whose point is realized in prob¬ 
lem 5.20. The theorem there about decompositions of QFT is original work 
by the second author. 
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Tricks 


There are several tricks of the trade, that is, tricks that are used in quantum 
algorithms, which are so “simple” they are rarely explained in any detail. We 
will not do that. We will give you the secrets of all the tricks so you can become 
a master. Okay, at least you will be able to follow the algorithms that we will 
soon present. 


6.1 Start Vectors 

A quantum algorithm needs to start on a simple vector. Just like classical algo¬ 
rithms, we usually restrict algorithms to start in a simple state. It may be okay 
to assume that all memory locations are set to zero, but it is usually not okay to 
assume that memory contains the first m primes. We have the same philosophy: 
start states must be simple. 

The simplest start state possible is eo — [1,0,0,... ,0], This is the one we 
would like generally to start with, but there are exceptions. Indeed, the first 
algorithm we will shortly present starts up in e\ — [0,1,0,0], Because this is 
also an elementary vector, it is reasonable to allow this as the initial state. An 
alternative is to show how to move from eo to e \ in a manner independent of 
the dimension N. 

The idea is that with respect to the indexing scheme, 0 corresponds to the 
string 0" and 1 to 0" -1 l, which differ only in the least place. Hence, we can 
regard the change as local with respect to the indexing of strings, which in 
turn corresponds to the ordering in tensor products. Thus, inverting the last bit, 
which entails swapping eo and e \, is accomplished by tensoring 11 with 
the matrix X we saw in chapter 3, 



Although this creates a huge matrix and looks heavy, it is just the linear- 
algebraic way of applying a NOT gate to the last string index. We can do this 
on the rth bit from the right, inducing permutations of [A] that move indices 
up or down by 2 r . Note that we have not transposed only eo and e \; we must 
be aware of other effects on the Hilbert space. 

Interchanging e \ and ei involves a different operation. In string indices, we 
need to swap .. .01 with ... 10. This is not totally local as it involves two 
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indices, but nearly so. Now we need to tensor the 4x4 swap matrix. 


SWAP — 


10 0 0 
0 0 10 
0 10 0 
0 0 0 1 


after This can be regarded as a benefit of the binary function/(a, b) = 

(b, a ) being invertible. 

Another interesting start vector j is the sum of all the e^, which must be 
divided by ~JN to keep it a unit vector. Aside from this normalizing factor, it 
has a 1 in each entry. We can obtain this from <?o by noting that the Hadamard 
matrix Hn — H 9 '" has l’s in its entire left column, and moreover it comes with 
the same ~J~N — 2"/ 2 factor. So we have 


Jn — W.vfo- 

Again by the tensor product feature, this operation is local to each individual 
string index. In terms of strings, it creates a weighted sum over all of {0,1}''. 
If we apply this to any other e*, then we get a vector with some -1 entries 
in place of +1 but giving the same squared amplitudes. This is because other 
columns in Hy have negative entries. 

Finally, we may wish to extend our start vectors to initialize helper bits. Gen¬ 
erally, this means extending the underlying binary string with some number m 
of 0s. In that case, because we already regard eo as our generic start vector, we 
need do nothing. Algebraically what we are doing is working in the product 
Hilbert space Hn <E> H m, with M — 2"\ because eo in the product space is just 
the tensor product of the first basis vectors of the two spaces. If we want to 
change any state a to a <g) eo, then we may suppose the extra helper bits were 
there all along. 


6.2 Controlling and Copying Base States 

Can we change any state a to a <S>«? Algebraically, the latter means the state b 
such that indexing by strings x,y e {0,1}", we have 


b{xy) = a(x)a(y). 
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The famous no-cloning theorem says that there is no 2 2 " x 2 2 " unitary oper¬ 
ation U such that for all a. 


U{a®e o) — a® a. 

However, a limited kind of copying is possible that can replicate computations 
and help to amplify the success probability of algorithms after taking measure¬ 
ments. 

Theorem 6.1 For any n > 1, we can efficiently build a 2 2 " x 2 2,1 unitary 
operation C n that converts any vector a' into b such that for all x,y e {0,1}", 

b(xy) = a' (x(x © >')). 

In particular, if «' = a <g> eo«, then we get for all *, 

a (x) — a' ixO") — b(xx), 

so that measuring b yields xx with the same probability that measuring a yields 

x. 


Proof. First consider n — 1. The operator must make b (00) = a(00), b(()\) — 
«(01), ft(10) = a(l 1), and b(l 1) = a(10). This is done by the 4x4 permuta¬ 
tion matrix 


CNOT — 


10 0 0 
0 10 0 
0 0 0 1 
0 0 10 


As we saw in section 4.3, the name CNOT stands for “Controlled-Not” 
because the second qubit is negated if the first qubit has a 1 value and is left 
unchanged otherwise. 

For n — 2, the indices are length-4 strings vi>’ 2 ’i " 2 , which are permuted 
into yiy 2 (yi © Z\ )(>’2 © zi)- This is a composition of two CNOT operations, 
one on the first and third indices (preserving the others), which we denote by 
Cj, 3 , and the other on the second and fourth, written as C 2 .4 ■ For general n, 
the final operator is the composition C n — Ci tn+ iC2,n+2 ■ ■ ■ C n 2 n □ 

Magically, what this does is clone every basis state at once. If a — e x , then 
b is the same as a ® a after all. An example of why this doesn’t violate the 
no-cloning theorem is that when a is a non-basis state, such as (e x + e y ), 

a <g) a is generally not the same as 4= (e xx + e vy ). 

V 2 
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We can now do various things. We can run two operations Uf computing a 
function/(x) on x side by side. Or we can do just one, applying /®" ® Uf to e xx 
to get e x f(x). Essentially, we are using two Hilbert spaces that we put together 
by a product. We can also arrive at this kind of state in the manner shown next. 


6.3 The Copy-Uncompute Trick 

Suppose we wish to compute/: {0,1}" —•> {0,1}'", where m < n. Such an/ is 
not invertible, so we cannot expect to map an input state e x to a quantum state 
that uniquely corresponds to y. We have already seen in sections 4.3 and 5.3 
the idea of replacing / by the function 

F(pc, v) = (x, v © f(x)). 

Then F : {0,1}"+'" —> {0,1}"+'" is a bijection, and the original function / is 
recoverable via F(x, O'") = (x,f(x)). 

Now suppose we have any quantum operation U on the “x” part, where/(x) 
might be embedded as a substring in m indexed places. We can automatically 
obtain the corresponding F(x) via the computation: 

(U © lm)Cm( U © C(V"), 

where the C m is applied to those index places and to m ancilla places. This 
effectively lifts out and copies f(x) into the fresh places. The final U* then 
inverts what U did in the first n places, “cleaning up” and leaving x again. 
Here is a diagram for n — 4 and m — 2 where the values/(x) = y\y 2 e {0, l} 2 
are computed on the second and third wires and then copied to the ancillae: 



This trick is called copy-uncompute or compute-uncompute. It is important 
to note that it works only when the quantum state after applying U and before 
C m is a superposition of only those basis states that have/(x) in the set of quan¬ 
tum coordinates to which the controls are applied. If there is any disagreement 
there in the superposition, then the results can be different. 
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This again is why the trick does not violate the no-cloning theorem. For a 
simple concrete example, consider the following quantum circuit, noting that 
the Hadamard matrix is its own inverse, i.e., is self-adjoint: 



On input eoo. that is, x\ = X 2 — 0, the first Hadamard gate gives the control 
qubit a value that is a superposition. Hence, the second Hadamard gate does not 
“uncompute” the first Hadamard to restore zi — 0. The action can be worked 
out by the following matrix multiplication (with an initial factor of 5 ): 

"1 0 1 Ol [1 0 0 Ol [1 0 1 Ol [1 1 1 -r 

0101 0100 0101 11 -1 1 
1 0 -1 0 0 0 0 1 10 -1 0 _ 1-11 1 

_o i o -ij L° o 1 °J L° i o _i J L-! 1 1 1 _ 

This maps eoo to [ F 1,1,-1], thus giving equal probability to getting 0 or 1 
on the first qubit line. 

However, if U includes a preamble transforming <?y to e x and then leaves a 
definite value y on the controlled lines before the rest of the circuit does U*, 
then the computation does end with the first n places again zeroed out, i.e., 
in some state f — eo « ® e y . This finally justifies why we can regard cy as the 
only input we need to consider. It emphasizes the goal of efficiently preparing 
a state from which a desired value/(x) can be recovered by measurement. 

As long as we are careful to represent the linear algebra correctly, we will 
not be confused between these two eventualities. Then we can do more tricks 
with superpositions and controls. 


6.4 Superposition Tricks 


Recall our j N vector, which in the case n = 2, N = 4 is j[l, 1,1,1], Feeding it 
on the first n of 2 n quantum coordinates and following it with controls gives 
the following state: 


(C n (f N <8>e 0 ))(xy) 


' j_ 

. Vn 

0 


if y = x 
otherwise. 
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Furthermore, these ideas show that we can construct a vector b such that 


b(xy) = 


■Jn 

0 


when y = fix) 
otherwise. 


Here xy is just the concatenation of the strings x and y. Moreover, by the last 
section, we can obtain a version of b even when y is just a single bit. In either 
case, we can also write 

X («* »«/«)• 

AG {0,1}" 


This is one of two instances where we find it most apt to mention Dirac 
notation. 1 The Dirac notation for this state is 

b= ~7^ X 

xg{0,1}" 


Definition 6.2 Given/: {0,1}" {0, l} m , the state s f = £ A . Ml fix)) 

is called the functional superposition off. 

We can also extend the conditional idea of C n directly to any given quantum 
operation U. Define CU by 

((CU)a)(0x) — a(x); ((CU)a)(\x) — (Ua)(x). 

We have used extra parentheses to make clear that CU is a name, not the com¬ 
position of matrices called C and U. and it is read “Control- UT Our CNOT 
operation did this to our matrix X of the unitary NOT operation, which explains 
the name. We can also iterate this, for instance, to do CCNOT. This yields our 
friend the Toffoli gate again. 


6.5 Flipping a Switch 

There are many old jokes of the form. How many X-es does it take to change a 
light bulb? In quantum computation, everything is reversible, and that applies 


1 The other involves writing the projector P a defined in section 5.5 via the outer product matrix, 
which is defined generally for all vectors a,b by \a)(b\\i. j\ = a(i)b(j), as P a = \a)(a\. Thus, for all 
vectors x. P a x = \a) (ti\x = a(a.x). While P a is not unitary, it contributes to the unitary reflection 
operator Ref a = - Pa — /■ 
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to jokes as well: If you change a light bulb, how many X-es can you affect? 
The answer is: as many as you like. 

Our light bulb can be the (n + l)st qubit, call it y. Suppose we multiply 
it by a unit complex number a, such as -1. It may seem that we are only 
flipping the sign of the last qubit, and we might even wrongly picture the (n + 
1) x (n + 1) matrix that is the identity except for a in the bottom right corner. 
The unitary matrices that are really involved, however, are 2" +I x 2" +1 acting 
on the Hilbert space, and by linearity, the scalar multiplication applies to all 
coordinates. Put another way, if we start with a product state z. <g> e y and change 
the latter part to ae y , then the resulting tensor product is mathematically the 
same as ( az ) ® e y . With a = - 1, we can interpret this as z being flipped instead. 
This feels strange, but both come out the same in the index-based calculations. 

This becomes a great trick if we can arrange for a itself to depend on the 
basis elements e x . Given a Boolean function / with one output bit, let us 
return to the computation of the reversible function F(x,y) — ( x, (y ®/(x))). 
Our quantum circuits for/ have thus far initialized y to 0. Let us instead arrange 
y — 1 and then apply a single-qubit Hadamard gate. Thus, instead of starting 
up with e x ® Co- we have e x <E> d , where d is the “difference state” 


d ( V2’V2 } V2 (e °' 


■Cl). 


Now apply the circuit computing F. By linearity we get: 


where 


F(x, d) = 


—=(F(xO ) - F(x \)) 

V2 


= 


> eo®/(jc) - e x <g> ci$/(*)) 


1 


V2 

= e x ®d'. 


(e.v <S> (^o®/(a) - e t®/(r))) 


-^(e° — e x ) if/ (x) — 0 
-^(ci-c 0 ) if/(x) = l 

= (-i y (x) d. 


Thus, we have flipped the last quantum coordinate by the value a x = (~l)/w. 
Well actually no—by the above reasoning, what we have equally well done 
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is that when presented with a basis vector e x as input, we have multiplied it 
by the v-dependent value a x . We have involved the (n + 1 )st coordinate, but 
because we have obtained a x e x ® d, we can regard it as unchanged. In fact, 
we can finish with another Hadamard and NOT gate on the last coordinate to 
restore it to 0. On the first n qubits, over their basis vectors e x , what we have 
obtained is the action 

€ x i—^ (- ] y (x >e x . 

This is the action of the Grover oracle. We have thus proved theorem 5.6 in 
chapter 5. We can summarize this and the conclusion of section 6.4 in one 
theorem statement: 

Theorem 6.3 For all (families of) functions/: {0,1}" —•> {0,1}'" that are 
classically feasible, the mapping from e x <yn to the functional superposition sj 
and the Grover oracle off are feasible quantum operations. □ 


6.6 Measurement Tricks 

There are also several tricks involving measurement. Suppose that the final 
state of some quantum algorithm is a. We now plan to take a measurement 
that will return y with probability |a(y)| 2 . In some cases, we can compute this 
in closed form, whereas in other cases, we can approximate it well. In other 
algorithms, we use the following idea: “Everybody has to be somewhere.” 

Let S be a subset of the possible indices y, and suppose that we can prove 

2>00l 2 >c>o 

ysS 

for some constant c. Then we can assert that with probability at least c a mea¬ 
surement will yield a good y from the set S. Note, the power of this trick is that 
we do not have to understand the values of cr(z) for z not in the set S. We need 
only understand those in the set S. This is used in chapter 11 when we study 
Shor’s factoring algorithm. 

When the set S is the set of all indices having a “1” in a particular place, this 
is called measuring one qubit. Note that S includes exactly half of the indices, 
as does its complement, which equally well defines a one-qubit measurement. 
The idea can be continued to define r-qubit measurements, each of which “tar¬ 
gets” a particular outcome string w e {0,1}' and involves the particular set S r 
of N/2' indices that have w in the respective places. 
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Theoretically, after a one-place measurement, the quantum computation can 
continue on the smaller Hilbert space of the remaining places. This does not 
matter to us because all algorithms we cover do their measurements at the 
end of quantum routines. As a footnote, we cover the principle of deferred 
measurement, which often removes the need to worry about this possibility 
because it illustrates the above controlled -U trick. We state it as a theorem. 

THEOREM 6.4 If the result b of a one-place measurement is used only as 
the test in one or more operations of the form “if b then UP then exactly the 
same outputs are obtained upon replacing U by the quantum controlled oper¬ 
ation CU with control index the same as the index place being measured and 
measuring that place later without using the output for control. 

Proof. As before, we visualize the control index being the first index, but it 
can be anywhere. Suppose in the new circuit the result of the measurement is 0. 
Then the CU acted as the identity, so on the first index, the same measurement 
in the old circuit would yield 0, thus failing the test to apply U and so yielding 
the identity action on the remainder as well. If the new circuit measures 1, then 
because CU does not affect the index, the old circuit measured 1 as well, and 
in both cases the action of U is applied on the remainder. □ 


6.7 Partial Transforms 

Another trick is applying an operator to “part” of the space. The general prin¬ 
ciple is really quite simple. Suppose that U is a unitary transform defined on 
vectors a in some Hilbert space. Then by definition there is a function u(x,k) 
so that 

(Ua)(x) = z u(x,k)a(k). 

k 

Now suppose that we want to extend U to apply to the vector b (xy). The ques¬ 
tion is how do we do this? The natural idea is to imagine that b is really many 
different vectors, each of the form b(xyo) for a different fixed value of yo- If 
we want the result of applying U to one of these, it should be 

^ u(x, k)b(xyp). 

k 
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Therefore, the result of applying U to the vector b is: 

c(xy) = z u(x,k)b(k,y). 

k 

As an example, let a(xy) be a vector. We can apply the Hadamard transform 
just to the first “x” part as follows. The result is 

b(xy) = 

This is what is meant by applying the Hadamard only to the “x” coordinates. 


6.8 Problems 


6.1. Suppose that a is a unit vector. If we know that a(k) = 1, then what can 
we say about a(£) for £ ^ k! 

6.2. Consider all unit vectors of dimension 4 that have ^ as entries. Show that 
all can be obtained from one another by using only Hadamard transformations. 

6.3. Fix a dimension N. Consider the group of unitary matrices generated by 
the Hadamard matrix H,v and all the permutation matrices. What are the sizes 
of the group for N = 2 and N = 4? What about for general N > 4? 

6.4. Prove the no-cloning theorem in the case n — 2: Suppose for sake of con¬ 
tradiction that U is a unitary matrix such that for any a — [a, b] T , 

U(a®e o) = a®a. 


To reach a contradiction, take a second arbitrary state b and use the property 
that unitary matrices preserve inner products. 


6.5. Show how to construct a unitary matrix U such that U[a,b, 0, 0] T = 
-^[a.b,a,b] T . Why doesn’t this contradict the no-cloning theorem? 

6.6. Recall the real-valued rotation matrices R x (@) from the last chapter’s 
exercises. Show that the controlled rotation CR X {9) can be simulated by two 
CNOT gates sandwiched in with the half-rotation CR X {6 / 2) and its inverse 
CR x (—9/ 2) on the target qubit line. 


6.7. Recall the T and V matrices from exercises in chapter 3: 


n o i 


e in/A 

1 

•'tf- 

T 

o 

Z 

i_ 

V2 

g -;>/ 4 

e ,)r / 4 J 
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Write out the matrices of their controlled versions, CT and C V. What are their 
squares? 

6.8. Show that CV can be written as a composition of (appropriate powers of) 
T gates before and after a controlled rotation CR x {0). Deduce that CNOT is 
the only two-qubit gate needed to simulate CV. 


The next two problems contribute to the converse direction of simulating 
the T -gate via Hadamard and Toffoli, which will be finished in the next chap¬ 
ter’s exercises. Recall problem 3.12 from chapter 3, where we computed the 
commutator of the 2x2 matrices H and S. 

6.9. Compute the commutator of / ® H with 


CS = 


1 0 0 
0 1 0 
0 0 1 
0 0 0 


0 

0 

0 

i 


Now is it possible to multiply by a scalar c such that all entries are powers of 
i? 


6.10. Undo the last part of the commutator in the last problem—that is, sand¬ 
wich CS between two Hadamard gates on the second qubit line. What gate 
do you have? Conclude that Hadamard and CS suffice to simulate the T -gate 
using one extra qubit line, as well as the Toffoli gate without needing any extra 
lines. 


The last problems here show how to use controlled gates to decompose the 
quantum Fourier transform into basic gates. 

6.11. Show that for any 2 k x 2 k unitary matrix A , the block matrix 

"/®* A ’ 

/®* -A 

can be written as the composition of H ® /® A and the controlled matrix CA. 

6.12. If A in problem 6.11 is a diagonal matrix, how many two-qubit controlled 
gates do you need to simulate CA2 
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6.13. Recalling the recursive equation for the quantum Fourier transform in 
problem 5.6, 


1 

> ( " _1) D N/2 ' 


1 

O 

CN 

1_ 

V2 

_ Da , /2 


-1 

<N 

u: 

o 


apply problems 6.11 and 6.12 to express it all as a composition of Hadamard 
gates, swap gates, and 2-qubit controlled twists. Mindful that the second matrix 
here is just Fn/2 <S> /, how many gates do you need? 


6.9 Summary and Notes 

As in other areas, the key to understanding is often more than knowing the 
“big” results—it is also important to know the important little tricks. This chap¬ 
ter has exemplified some tricks using our index-based notation for vectors and 
matrices. The tricks focus on the parts of linear algebra that are most relevant 
to quantum computing, so for further reading we suggest texts such as Nielsen 
and Chuang (2000), Yanofsky and Mannucci (2008), Hirvensalo (2010), and 
Rieffel and Polak (2011). 

We have attempted to distinguish between tricks of general algorithmic 
importance and tricks and properties of specific quantum gates. We have put a 
lot of the latter in the exercises of chapters 3, 5, and here. The text by Williams 
(2011) has a cornucopia of further details about quantum gates, and even more 
can be said about engineering issues for gates and circuits. We have regarded 
these exercises first as giving practice in linear algebra, and second as a reason¬ 
able substitute for in-text coverage of model-specific simulation theorems. The 
message of the last problems here is that commutators involving Hadamard 
and (Tangled phase gates, with CNOT and ancilla qubits to nail things down, 
enable simulating the effect of gates of phase angles 9/2, 9/4, 9/ 8, and so 
on. We have chosen not to go further in proving that this enables sufficiently 
fine approximation of the twists T K / 2 "- 1 involved in the representation of the 
QFT in problems 6.13, as the full details would take us out of scope now. This 
formally justifies counting the QFT as feasible with these gate sets, in Shor’s 
algorithm and other applications, but we prefer to take the entire QFT as basic 
while presenting the algorithms. 





Phil’s Algorithm 


There is no quantum algorithm named after Phil—at least none that we know 
about. The goal of this chapter is to give the schema we will use for presenting 
all the rest of the quantum algorithms and say how they give their results. We 
will then tell you who Phil is. 

We will always start with a description of what the algorithm actually does. 
This will usually be of the form: 


Given an X , Phil’s algorithm finds a Y within time Z. 


Sometimes the goal is achieved always, otherwise it comes with a specified 
probability or expected length of time. We may add some additional comments 
on why the problem is interesting and important. 


7.1 The Algorithm 

Each algorithm will be presented as computing a series of vectors. In the first 
few algorithms, the number of vectors is fixed independent of the size of the 
input object X. This is an interesting point because you might have expected 
that the number of vectors would grow as X gets bigger. The reason for this 
is that each vector corresponds to a macrophase of the algorithm. Often the 
algorithms have only a small number of macrophases, where each phase does 
something different. 

In describing the vectors, we will always explain what Hilbert space they 
are from. Again, there is some commonality: all but those using the quantum 
Fourier transform are directly understandable in real spaces, whereas Shor’s 
algorithm uses a complex Hilbert space. Problems 7.8-7.14 explore this matter 
further. 


7.2 The Analysis 

Quantum algorithms are similar to classical ones in that often the algorithms’ 
descriptions are simpler than their analysis. The analysis of these algorithms 
usually will consist of giving an explicit description of what the i-th vector is. 
The algorithm gives the operational description of the vector: it is the result 
of applying unitary transformations to the start vector. Here we give a non- 
operational description: the vector is described by the following mathematical 
expression. 
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Once we know what each vector is explicitly, we will know what the last 
vector is. Then we will understand what the result of the last step of each 
algorithm does because in all cases the last step is a quantum measurement. Of 
course to understand measurements, we must know the amplitudes given by 
the last vector because the measurement returns k with the amplitude squared 
of the A'-th coordinate. 

There is one more complication—isn’t there always? Some algorithms are 
finished after the measurement is made: the measurement’s value determines 
the answer completely. Other algorithms require that additional classical pro¬ 
cessing is performed on the result of the measurement. Some are a bit more 
involved, in that they need the quantum algorithm to be run a multiple number 
of times. Each run of the quantum algorithm gives a small amount of infor¬ 
mation about the answer that is desired. This happens, for example, with both 
Simon’s and Shor’s algorithms. 


7.3 An Example 

Here is Phil’s algorithm—we said there was no such algorithm—so we made 
one up. It operates over a two-dimensional Hilbert space H 2 . The start vector 
ao is 

1 

0 

The next vector is a\, which is equal to Hiao —of course Hi is the 2x2 
Hadamard transform. Then we measure this vector and return the index 0 or 1. 
We see 0 with probability «|(0) and 1 with probability ay( 1). That is it—not 
too exciting, but it does compute something. What it provides is the ability to 
flip a fair coin. This will be a building block of other algorithms. 


7.4 A Two-Qubit Example 

Phil becomes a bit more ambitious now, so he has two qubits. He carries out the 
composition of l/j = H ® / and V 2 — CNOT , which we illustrated in chap¬ 
ter 4. We index vectors in this two-qubit space by xy, where x and y are single 
bits. In the itemized format we use for quantum algorithms, this is what he 
does: 





7.4 A Two-Qubit Example 


65 


The Algorithm 

1. The initial vector is ao so that ao(00) = 1—that is, a o = c'oo- 

2. The next vector a\ is the result of applying the Hadamard transform on 
qubit line 1 only. 

3. The final vector ai is the result of applying CNOT to a \. 


The Analysis 


«i = -^=(<?o +<?t) <S>e 0 = -^(eoo + eio) = -^[1,0,1,0]. 


Then because CNOT swaps the third and fourth Hilbert-space coordinates, we 
can jump right away to see that 


«2 = —^=[1,0,0,1], 
\/2 


Thus far, we have not said anything about taking measurements—instead, we 
are able to specify the final pure quantum state we get. In the Hilbert space 
coordinates it doesn’t look exciting, but let’s interpret it back in the quantum 


coordinates: 


02 = —^=(eoo +eu 

V2 


)• 


This state is pure and not a tensor product of two other states, so it is entangled. 
This matters immediately if and when we do a measurement. If we measure 
both qubits, then we will only get 00 or 11, never 01 or 10. If we measure just 
the first quantum coordinate and get 0, then we know already that any mea¬ 
surement of the second quantum coordinate will give 0. Thus, Phil’s algorithm 
has produced an entangled pair of qubits. 

This becomes significant when we are able to give the first qubit to some¬ 
one named “Alice” sitting 10 miles east of Lake Geneva and the second qubit 
to her friend “Bob” sitting 10 miles west, and each does a measurement at 
instants such that no signal of Alice’s result can reach Bob before he measures 
and vice versa. Whatever result Alice gets. Bob gets too. Albert Einstein called 
the effect “spooky” because it appeared to violate his own established princi¬ 
ple that influence could not propagate faster than light, but that worry hasn’t 
stopped real Alices and real Bobs from executing this algorithm at a distance. 
So thinking in physical terms, Phil’s little algorithm was good enough to stump 
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the great Einstein. However, we do not have to think in physical terms—Phil’s 
output is just an ordinary vector in our four-dimensional Hilbert space. What 
we do need to think more about, to finish the analysis of our algorithms, is 
measurement. 


7.5 Phil Measures Up 

Now we will tell you who Phil is. Phil is a mouse. Unlike a certain famous 
quantum cat, who does nothing except lie around half-dead all day, Phil is very 
active. Phil runs through mazes like many other laboratory mice, but there are 
some special things about Phil: 

• Phil runs through every path in the maze at once. Like we said, he is very 
active. He follows Yogi Berra’s advice: when he comes to a fork in the road, 
he takes it, becoming two Phils. 

• Some corridors in the mazes have a piece of cheese. When Phil eats a piece 
of cheese, he turns into Anti-Phil. If Phil runs into Anti-Phil, they annihilate 
each other, leaving nothing. Not a combination, as with Schrodinger’s cat, 
but really nothing. However, if Anti-Phil eats a second piece of cheese, then 
he turns into Phil again. A third piece makes Anti-Phil again, and so on. 

• If Phil meets himself—not Anti-Phil—where corridors come together, then 
they run alongside each other. If Anti-Phil meets Anti-Phil, then they like¬ 
wise start to form an Anti-Phil pack. If two opposite packs meet, then they 
still cancel each other in pairs, leaving just the surviving members of one 
pack or the other, if any. 

• If a pack reaches an exit of the maze safely, then it can be put under an 
incubator that mutantly grows it to the square of the number in the pack. 
The mutant is the same whether the pack has Phils or Anti-Phils—it is a 
“Mighty Mouse.” Dividing its size by a certain number that depends on the 
number of stages with cheese and the manner of measurement gives a value 
between 0 and 1, which is the probability that Phil exits the maze there. 

• After the division, the mutants “collapse,” and Phil comes together as just 
one ordinary mouse again. Or we think he does—the chapter end notes 
briefly mention some debate about this. At least we can say that no labo¬ 
ratory animals were harmed in the course of writing this book. 




7.5 Phil Measures Up 


67 


Phil can eat several kinds of cheese, but one French type is far and away his 
favorite: Hadamard cheese. A corridor with Hadamard cheese is labeled -1 
in the maze blueprints. More complex types of cheese might have labels like 
i and ~i and e ,ir/4 and c 4, ' t4 , and these would create “Half-Moon Phil” and 
“Crescent Phil” and “Gibbous Phil” and other “phased-ouf ’ spectral mice. But 
as we will state formally in chapter 16, Hadamard cheese and “Anti-Phil” are 
ultimately a rich enough basis. Assuming that the only nondeterministic gates 
are /z-many Hadamard gates and all mice are measured, the division number is 
2 h . 

The mazes have N entrances, one for every basis vector e x , and N exits with 
the same labels. They are built in stages, each with N opening and closing 
junctures keeping the same labels, one stage for each basic quantum operation. 
A Hadamard stage gives a choice of two corridors: one going straight across 
and one diagonally. Cheese is placed in half of the horizontal corridors—those 
whose label has a “1” in a certain one of the n places. The other stages we need 
to consider are all permutations of {0,1}" and are built by routing the corridors 
according to the permutation, giving no other choice. Corridors may “cross” 
each other—that is, the maze is three-dimensional. Figure 7.1 gives the maze 
for the above computation with one Hadamard gate and one CNOT gate. 

Figure 7.1 

Maze for Hadamard on qubit 1 followed by CNOT on 1 and 2. 
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To interpret this, suppose Phil enters the maze at the upper left, that is, with 
input 00. He can run across or down. Well, he does both. One Phil ends up 
at 00, the other at 11. Although neither got to eat any cheese, the presence of 
one Hadamard stage says to divide by 2 after squaring. Thus, each of the two 
outcomes has measure H/2 1 = 0.5. The outcomes 01 and 10 have no Phils, so 
they measure 0. 

If Phil starts running at entrance 01, then each of those outcomes gets one 
Phil while 00 and 11 get none. 

Anti-Phil makes an appearance if we start Phil at 10. Phil scampers up to 
00 and stays there without eating cheese, but another Phil scampers across, 
eats cheese, and in the second stage jumps down to 11 as Anti-Phil. Thus, the 
final state is different from the case before: allowing for the cheese factor, it is 
-^[1,0,0, _ 1] rather than -^[1,0,0,1]. But its measurements are the same: it 
gives probability 0.5 on 00 and 0.5 on 11. People who bet on where the mouse 
ends up don’t care—Anti-Phil gives the same measurement value as Phil. On 
input 11, we similarly get Phil at 01 and Anti-Phil at 10. 

For the simplest case where Phil and Anti-Phil collide, see figure 7.2, which 
shows two consecutive Hadamard gates on a single qubit. We have annotated 
each juncture with the “Phil counts’’ for the respective pair of entrances 0 and 
1, where positive values denote a Phil pack and negative values an Anti-Phil 
pack. 


Figure 7.2 

Maze for two consecutive Hadamard gates. 


(1,0) (1,1) (2,(1-1)) = (2,0) 



(0,1) (1,-1) ((1-1),2) = (0,2) 


Again, suppose Phil starts at the upper left, from 0. One Phil scampers 
straight across twice to exit at 0, while another scampers diagonally down to 
1 and back up to meet him at that exit. They exist as a pack of 2, which gets 
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squared to 4. Because there are h — 2 Hadamards, this gets divided by 2 h — 4. 
Thus, the probability of exiting at 0 is 1.0, which doesn’t leave room for exiting 
anywhere else. But what about the other Phils? 

The first Phil who scampered straight ahead has a second choice and can 
scamper diagonally down to the exit 1, still eating no cheese and hence ending 
there as Phil. The second Phil who scampered down, however, can stay down 
to eat the cheese and ends up at 1 as Anti-Phil. The Phil and Anti-Phil at 1 
cancel, leaving 0 as their measurement. Thus, it is impossible for Phil entering 
at 0 to exit at 1—people who bet on that outcome will never win. 

Similarly, if Phil enters at 1, then he must exit at 1. But note what happened: 
one Phil scampered up and then back down and never ate any cheese. The other 
scampered across to be Anti-Phil, but then ate the other cheese to become 
Phil again just in time. Thus, the two consecutive Hadamard gates leave the 
same condition on exit as they had on entering—they are equivalent to the 
identity operation, which means just corridors going straight across with no 
choices. Whether they are really the same as “no-operation” is another matter 
for debate—in practice, it seems that the real Phils get some wear and tear and 
mussed-up hair and give measurements that are close to 1.0 and 0.0 or to other 
intended values like 0.5 but not quite equal. 

Of course this is just an elaboration of what happens when you multiply the 
matrices. Here we are just doing 


H H 


1 

1 

1 

1 

1 1 

1 

2 0 

V2 

1 

-1 

V2 

1 -1 

“ 2 

0 2 


Indeed, every self-adjoint matrix U , meaning U — U* , that is also unitary 
cancels itself out this way when squared. Using “Phil” helps visualize the 
quantum-mechanical effects of superposition and interference, but the math 
is just linear algebra. Hence, after the next chapter, we will “retire” him and do 
proofs formally, but he will make a useful return in the last chapters. 


7.6 Quantum Mazes versus Circuits versus Matrices 

The main problem with our maze diagrams is that they do not scale —they grow 
with N, which is exponential in n. Hence, most sources in quantum computa¬ 
tion prefer to write circuit diagrams, which scale with n instead. Here again is 
our diagram for the Hadamard plus CNOT circuit, except that now we wish to 
label the values carried by the circuit’s individual “wires”: 
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Xl 


X2 


H 


- 0 - 


(?) 


z 1 


Z2 


Here x\ and xj are variables denoting the input qubits, zi and z .2 are the out¬ 
puts, and y is a variable denoting the choice offered at the Hadamard gate. Our 
unease with the circuit diagram, however, is represented by the “(?)” label: 


Whereas a Boolean circuit, on a given input, always has 
a definite value (0 or 1) at every gate juncture, this is not 
always true of a quantum circuit owing to entanglement. 


The value at “(?)” could be 0 or 1, with equal probability in fact (and hence 
equal amplitudes l/\/2), but it is not even correct to say that (eo + e\)/\/2 is 
its value. The value is entangled with the value of y, with both depending on 
the input values. 

The similar diagram for two Hadamard gates is even simpler but does not 
immediately help us calculate that they cancel: 


x 



H 


z 


Moreover, we expressed in section 6.5 the opinion that showing a scalar mul¬ 
tiplication as occurring on a qubit line can be misleading, and this extends to 
other kinds of phase transformations. 

The problem is that the mazes do not scale, whereas the circuits scale but 
make entanglements and some other information hard to trace. The advan¬ 
tage of our functional notation for vectors and matrices is that it scales while 
preserving everything. However, as experienced with functional programming 
notation in ordinary classical computing, it gives less “feel” for the objects. 
Hence, we will still often write out vectors and matrices in full. 

The mazes are exactly the directed graphs corresponding to matrix products 
that were defined at the end of chapter 3, except that the adjacency matrices 
for the graphs of Hadamard stages can have ~1 in place of +1. The mouse 
enters a matrix U in some row i and may exit in some column j if U[i,j] ^ 0; 
if U[i,j\ — -1, then it picks up some cheese. Thus our mouse executes the 
product of the matrices, which is expressly visualized as a sum over paths in 
the graphs. This was the intuition of Richard Feynman originally with regard 
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to so-called S-matrices representing dynamical systems. To animate his sum- 
over-paths formalism, we have named the mouse for his middle name, Phillips. 
That his middle name was plural makes it work even better. 


7.7 Problems 

7.1. Show that the vector a\ in section 7.3 is equal to 

1 r r 

sfl 1 

Also, what does the algorithm actually do? Can you replace it by a classical 
one? 

7.2. Suppose that there is a unitary matrix U so that a — Ue o where a(x) — 
1 /M if and only if x is a prime number in the range 0,.... N — 1. What is the 
value of Ml What happens when we perform a measurement on al 

7.3. Draw the maze stage for a Toffoli gate using eight “levels” labeled 000 
through 111. Suppose the Toffoli gate is upside-down, that is, it can alter the 
first rather than the third quantum coordinate. Then what does the maze stage 
look like? 

7.4. Recalling the controlled- V operation from the last chapter’s exercises, 

0 0 

0 0 

1 -(- i 1 — i 
1 — i 1 —|— z" 

note how we can characterize it operationally: 

CV[()a, be] = l[b,0]l[a,c] 

CV[la,bc] = l[b,l]V[a,c]. 

Now let CIV stand for the same idea with the “control” on the first qubit, 
but with the conditional V on the third qubit, while the second qubit is just 
ignored. By oddity of notation, we cannot write this as a tensor product of the 


CV 


2 0 
0 2 
0 0 
0 0 
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2x2 identity matrix / with CIA as we could if the ignored qubit were the first 
or third. But we can describe it equally well operationally by 

CIV[Oab, cde] = l[0,c]l[a,d]l[b,e], 

CIV [lab, cde] = l[l,c]l[a,d]V[b,e], 

Write out the 8x8 matrix for this operation. 

7.5. Show that the Toffoli gate TOF obeys the equation 

TOF = (/ <g> CV)(CNOT <8> /)(/<» CV*)(CNOT <g> l)CIV. 

Thus, the Toffoli gate can be written as a composition of two-qubit gates. Here 
is the famous quantum circuit diagram for this formula: 



The intuition is that if the top bit is false, then the two CNOT s and the final 
CV go away, leaving CV and CV*. which cancel, so the whole thing acts as 
the identity. If the top bit is true but the middle bit is false, then the first CNOT 
makes the middle bit true in time to activate the CV* , which then cancels 
with the second CV activated by the first bit. If both bits are true, then both 
CV gates are activated, whereas the middle bit becomes false and inhibits the 
CV* . This gives the action of V 2 , which equals X. Thus, the whole action is 
ccx, which equals the Toffoli gate. 

The problem posed here is to verify the equation using our matrix indexing 
notation instead and then write a 5,000-word essay comparing it to the circuit 
intuition. OK, we are kidding about the 5,000-word part. 

7.6. Deduce that the Toffoli gate can be simulated using CNOT and single¬ 
qubit matrices without needing any ancilla qubit lines. 

7.7. Use the swap gate to write CIV as a composition of matrices, each of 
which is a tensor product of / with a 4 x 4 matrix. Conclude that TOF equals 
a composition with each term a tensor product of / and a 4 x 4 matrix. 

The next problems complete a cycle of proving the equivalence of three 
famous gate sets as a basis for quantum computation and prove that matrix 
entries 1,0, - 1 normalized by powers of \fl suffice to represent any quantum 
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computation’s input-output behavior. Interestingly enough, the first set is con¬ 
sidered the easiest to engineer with the second close behind, even though the 
last set has only real phases. 

• Hadamard, CNOT, and single-qubit T-gates. 

• Hadamard and CS gates. 

• Hadamard and Toffoli gates. 

7.8. Recalling problem 3.7 in chapter 3, define instead 

U = R<g> I + Q<g> R x (tc). 

Explain why U is still unitary. Show that any measurement involving U can be 
simulated by measurement(s) involving U instead. 

7.9. Compute S and show that it is the same as CR x (n), that is, the controlled 
version of the product XZ. 

7.10. Now find U with U — CS instead. 

7.11. Next show that the U in problem 7.10 can be simulated by two Hadamard 
and two Toffoli gates. 

7.12. For the dmmroll, show that the ~ operation commutes with matrix mul¬ 
tiplication. You may find it easier first to argue the same thing for the mapping 
U' in problem 3.7, namely, that for any matrices A and 6 , 

C ABY = A'B'. 


7.13. Conclude that the observation in problem 7.8 about measurements in 
fact applies to entire quantum circuits of real-transformed gates. Hence, con¬ 
clude that although Hadamard and Toffoli cannot directly simulate any gate 
with complex entries, they can simulate the measurements—and hence the 
outputs—of any circuit involving Hadamard and CS gates. 

7.14. Conclude that, although the Quantum Fourier Transform cannot be 
approximated by Hadamard and Toffoli gates, because it has complex-number 
entries and the latter do not, the measurement probabilities of any (feasible) 
circuit that uses QFT can be closely approximated via measurements of a (fea¬ 
sible) circuit involving just Hadamard and Toffoli gates in place of QFT. 
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7.8 Summary and Notes 

This chapter has given an overview of the structure of the next chapters and 
a peek at how quantum algorithms get their power: entanglement, forking, 
interference (i.e., cancellation), amplification (whose significance is raised by 
squaring), and the handle on algebra of size N that is given by primitives that 
scale with n. All of these elements have been hotly debated in more than a 
century of physics and philosophy. We have only briefly mentioned that entan¬ 
glement was puzzling to Einstein, and we passed over “Schrodinger’s Cat” 
without a sniff of paradox. We somewhat agree with Stephen Hawking’s noted 
quip on the latter: 


When I hear of Schrodinger’s cat, I reach for my gun. 


What we really reach for is material on the engineering problem of building 
quantum computers that scale, which turns on the rate at which physical noise 
can “muss the hair” of our quantum animals. For anything as large as a cat, 
it may simply be too much, but if components can be made small enough, be 
isolated enough, and run fast enough, errors caused by “noise” may be minimal 
or correctable enough to enable real machines to work as the blueprints say. 
We hosted a year-long debate about this between mathematician Gil Kalai and 
computer physicist Aram Harrow on the Godel’s Lost Letter blog in 2012. 

Still, we must concede that our own animal analogy for the workings and 
results of quantum algorithms has resorted to some weird features: dupli¬ 
cate Phils, Anti-Phils, mutant squaring. To express the intuition of David 
Deutsch for his first quantum algorithm, we would have to say further that 
each Hadamard fork creates not only a duplicate Phil but also a duplicate 
maze in its own parallel universe. Multiple Phils would somehow amplify and 
interfere with each other through the boundaries of these universes. We would 
respond more simply that it suffices to use the idea of summing squares, which 
goes back before Pythagoras, and that wave interference and the two-norm of 
vectors were studied even before the public birth of probability theory with 
Blaise Pascal. Still, we realize that such historical grounding does not prevent 
the debates from becoming deeper. Rather than itemize some of the myriad 
sources for these debates, we feel the best thing is to dive in and start covering 
Deutsch’s algorithm. 
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A similar picturing of sum-over-paths to ours is in the survey by Aharonov 
(1998, 2008). Regarding the constant to divide by after “squaring Phils,” we 
should say more precisely that it depends on the number of stages with branch¬ 
ing ; the Pauli Z matrix, for example, would give “cheese” but no branching. 
Matters like this are treated further in chapter 16 and its exercises. The sim¬ 
ulation of Toffoli gates by two-qubit gates in the exercises is from Barenco 
et al. (1995). The quantum circuit diagram in problem 7.5 is the immedi¬ 
ately first example in the tutorial for the Qcircuit ,tex package, which 
we have used for this book. We took the diagram’s code verbatim from the 
tutorial, except both it and the paper have a general double-controlled U in 
place of CCX for Toffoli, because the only property the construction needs 
is V 2 — U. Aharonov (2003), following on from Shi (2003), offers the basis 
for problem 7.8 and the exercises after it. These complete a proof of the quan¬ 
tum universality of Hadamard plus Toffoli gates, which formally justifies lim¬ 
iting attention to “Phil” and “Anti-Phil” in visualizing measurements and is the 
ground for theorems in chapter 16. 

The reality of qubits is just one of many physical capabilities shown by 
experiments with entanglement. The one over sizable distances over Lake 
Geneva is by Salart et al. (2008) and is actually titled, “Testing Spooky Action 
at a Distance.” Systems that use entanglement to verify that communications 
between banks have not been eavesdropped are already in practical use. Last 
and most, we should say that the “algorithm” emphasized by Feynman (1982, 
1985) is the ability of quantum computers to simulate quantum processes in 
real time. We have not devoted a chapter to this algorithm, except insofar as 
this chapter serves. 
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Deutsch’s Algorithm 


Deutsch’s algorithm operates on a Boolean function: 

/: { 0 , 1 }^ { 0 , 1 }. 

The goal is to tell whether the function is a constant by performing only one 
evaluation of the function. Clearly this is impossible in the classical model of 
computation, but the quantum model achieves this in a sense delineated below. 

This problem is important for its historical significance because it was the 
first nontrivial quantum algorithm. It also shows that quantum algorithms can 
be more efficient than classical ones, even if the advantage in this case is minor. 
The classical solution requires two evaluations of the function, whereas the 
quantum solution requires only one. This is not an impressive difference, but 
it is there, and even this small difference suggested, correctly, that far greater 
differences would be possible. For this reason, Deutsch’s algorithm retains its 
importance. So let’s start to look at it in detail. 


8.1 The Algorithm 

We will present the algorithm as computing a series of vectors , 02 ,( 13 , 
each of which is in the real Hilbert space Hi x Hh, where Hi and H 2 are 
two-dimensional spaces. We index vectors in this space by xy, where x and y 
are single bits. For whichever Boolean function/ is specified, recall again from 
section 4.3 that we work with its invertible extension, which we here symbolize 
as f (xy) — x(f(x) ©y). Thus, the “input” to the algorithm is really the choice 
off as a parameter. The algorithm always uses the same input vector and goes 
as follows: 

1. The initial vector is ao so that ao(01) = 1. 

2. The next vector a\ is the result of applying the Hadamard transform on 
each H,- of the space with i = 1,2 separately. 

3. Then the vector ai is the result of applying Uf where /'(xy) = x(f(x ) © y). 

4. The final vector 03 is the result of applying the Hadamard transform again, 
but this time only to Hi. 

Note that in case/ is the identity function,/' becomes the Controlled-NOT 
function, and Uf becomes the 4x4 CNOT matrix. Because / is the identity, 
we rename it U/. Similarly, we write U x , U T , and U F for the cases / being the 
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negation, always-true, and always-false function, respectively. Thus, the four 
possible matrices that incorporate the given function/ are: 


U, U X 


"1 

0 

0 

0 “ 


"0 

1 

0 

0 " 

0 

1 

0 

0 


1 

0 

0 

0 

0 

0 

0 

1 


0 

0 

1 

0 

0 

0 

1 

0 


0 

0 

0 

1 


U T U f 


"0 

1 

0 

0 “ 


"1 

0 

0 

0 " 

1 

0 

0 

0 


0 

1 

0 

0 

0 

0 

0 

1 


0 

0 

1 

0 

0 

0 

1 

0 


0 

0 

0 

1 


Note that the matrices U T and Uf are unitary even though the always-true and 
always-false functions are not reversible. This illustrates the quantum trick of 
preserving the “x” argument of these functions as the first qubit and recording 
/(x) in terms of its effect when exclusive-or’ed with the second qubit, y. 

According to the algorithm, we will sandwich one of these four matrices 
between the H 2 ® Hi matrix on the left and the matrix for Hi® I on the right. 
The latter we saw in chapter 7, whereas the former is 


1 

2 


1 

1 

1 

1 


1 

-1 

1 

-1 


1 1 

1 -1 

-1 -1 

-1 1 


The chain of three matrices is applied to the start vector ao = <?o 1 on the 
right, producing in each of the four cases the vector < 13 . A measurement of a 3 
will then determine whether we are in one of the two constant cases, where 
U T or U F is used, or whether we have one of the other two cases U, or U X , 
which represent the nonconstant functions/. The point again is that in contrast 
to classical algorithms, which need to call/ twice to evaluate/( 0 ) and/(l), 
the quantum algorithm can tell the difference with just one Uf oracle matrix, 
where again/' is the “controlled” version off. 


8.2 The Analysis 

First let’s invite Phil—the mouse from chapter 7—to do the analysis. Given 
the input eoi, he enters at 01. In figure 8.1, we see the same maze stage at left 
and right, which corresponds to a Hadamard gate on the first of two qubit lines. 
The stage for Hadamard on line 2 comes after it on the left. Next, one of the 
four matrices above is filled in the blank for Uf. Each is a permutation matrix. 
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so its four corridors will run across with no branching, and we can already see 
what they do: Uf makes the corridors run straight across, Ux interchanges the 
top two, Ui interchanges the bottom two, and U T swaps both. 


Figure 8.1 

Maze for Deutsch’s algorithm. 




Phil starts running and splits into four by the time “he” reaches the gap: 
Phil at 00, Anti-Phil at 01, Phil at 10, and Anti-Phil at 11. Now we can visual¬ 
ize what happens when the missing stage is dropped in. Figure 8.2 shows the 
diagrams for the stages corresponding to the matrices U/, Ux, Ur, Uf given 
above. 

If the stage does Up for the constant-false function, then the mice run 
straight across to the last stage. Then the two Phils can each run to exit at 
00, so 00 has positive amplitude 2. The two Anti-Phils are likewise the only 
mice who can run to the exit for 01, and they amplify each other to give ~2 
there. Although this is negative, its square will still be 4, the same as for 00. 
Because there are three cheese stages, the divisor is 2? — 8, so each outcome 
has probability 0.5. This already shows there cannot be any amplitude left over 
for outcomes 10 or 11, but let us verify from the maze: each gets a Phil and an 
Anti-Phil, which cancel. 
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Figure 8.2 

Maze stages for possible queried functions. 



The matrix U T swaps the top and bottom pairs of Phil and Anti-Phil, but 
this changes nothing in the above analysis, except now 00 has ~2 and 01 has 
+2. So again a measurement can give only 00 or 01. 

The other two matrices do only one of the swaps, however, and that changes 
the picture. Now Phil and Anti-Phil gang up to cancel the 00 and 01 outcomes, 
but they amplify for 10 and 11. Hence, a measurement certainly gives “1” in 
the first place. The set S corresponding to 0 gives rise to a 1-qubit measurement 
that perfectly distinguishes the constant-function and nonconstant cases. 

To be rigorous—and because we said “analysis by Phil” does not scale as n 
grows—we need to do the linear algebra. We state our objective formally: 

THEOREM 8.1 A measurement of the vector «3 will return 0_v, for some v, if 
and only if / is a constant function. Thus, Deutsch’s algorithm tells whether/ 
is constant using just one application of Uf. 

Of course this theorem is the key: one application off and one measurement 
will tell whether / is a constant function. To save multiplying out the 4x4 
matrices for each of the four cases—a method that doesn’t scale either—we 
use our notation indexing vectors a by fl(00),a(01),a(10),a(l 1). The proof 
depends on the following lemma, in which we use binary XOR on bits to 
denote a number. 
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Lemma 8.2 The following are true: 

1. For all .vy, a\ (xy) — ^(-1) v . 

2. For allxy, a 2 (xy) — \(“ 1 

3. For all xy, \a?,{xy)\ 2 = | |(-l/® + (~]f'^ x \ 2 . 

Proof. Let us prove (1). It is clear that applying Hadamard gates independently 
yields 

«l(*y) = 

t,U 

Thus, by the definition of a o, 

which is ^(“l) v . 

Let us next prove (2). By definition of the matrix Uy it follows that 


a 2 (pcy)=ai(x(f(x)®y)) = I(-l/«®/ 

Let us finally prove (3). Again by definition of the Hadamard transform, 

<*3(xy) = 

= ^-^(-l) t - f (-l/ (r) ®- v . 

2V2 t 

Note that we can expand the sum and show that it is 


1 

2V2 


^_2^(0)©y ^ ^_jyf©/(l)©y^ 


We can factor out the common term ( - l) v to get the amplitude: 


I « 3 ( L >’)| 2 = l 


(—iy(Q) + (—j_y(i)®jc 


□ 


Proof of Theorem 8.1. By lemma 8.2, |«3 (Ov) | 2 is 

ik-i/w + oi/m ' 2 


If / is constant, then this expression is equal to \f2? — If / is not constant, 
then it is equal to 0. □ 
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8.3 Superdense Coding and Teleportation 

We digress from our algorithms involving unknown functions to show other 
actions with four alternatives. Both involve entanglement and begin with the 
Hadamard plus CNOT combination detailed in chapter 7, rather than with two 
Hadamard gates as in Deutsch’s algorithm. They carry out the most basic forms 
of general constructions called superdense coding and quantum teleporta¬ 
tion. 

Both applications involve a physical interpretation and realization of qubits. 
As in the last chapter, let us talk of people named “Alice” and “Bob” across 
Lake Geneva from each other. First consider a general product state 

c — {a 0 eo + ciiei)<g)(boeQ + biei) 

— aoboeoQ + ao^i^oi + + aibien, 

where |ao| 2 + l^i | 2 = 1 and |/?o| 2 + \b\| 2 = 1. Here we can regard a — «o^o + 
a\e\ as a qubit wholly in the control of Alice and b — b^e® + h\e \ as a qubit 
owned by Bob, with c = a ® b standing for the joint state of the system. Thus 
far, so good. 

The interpretation is that the identification and ownership of qubits applies 
even when the system is in a general pure state of the form 

d — dooeoo + doieoi + d io«io + 

with |<7oo| 2 + Woi | 2 + |<iio| 2 + l^nl 2 = 1. In terms of quantum coordinates, 
Alice controls the first index, which plays t/oo^ot against dw,dn, while Bob 
controls the second index, which plays the even-index entries doo,dw in places 
0 and 2 off against the odd entries doi,du in places 1 and 3. There is one 
other partition that plays off two against two, the “outers” doo, d\ \ versus the 
“inners” t/oi.t/io- This playing-off can be achieved directly by a different kind 
of measurement that projects onto the transformed basis whose four elements 
are given by coo ± e\ \ and eoi ±«io, each normalized by dividing by V 2. This 
basis is named for John Bell, who proved a famous theorem showing that the 
statistical results of measuring entangled systems cannot be explained by deter¬ 
ministic theories with local interactions only. 

The physical realization is that after converting our usual all-zero start state 
eoo to 
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we really can give Alice a particle representing the first coordinate and shoot 
Bob across the lake an entangled particle representing the second coordinate. 
The experiments mentioned in the last chapter and below have demonstrated 
that Alice and Bob can for some time keep these particles in this joint state. 
Moreover, Alice is physically able to operate further on this state by matrix 
operators applied only to her qubit, that is, operators of the form U ( S> / where 
U is a 2 x 2 unitary matrix. 

In particular, let U be one of four things: (i) /, (ii) X , (iii) Z, or (iv) 



Thus, Alice is applying one of the Pauli matrices discussed in the exercises to 
chapter 3. Let Alice do one of these four things and then shoot her qubit across 
the lake to Bob. Can Bob, now able to carry out multi-qubit operations such 
as CNOT , figure out which one she did? The answer is yes. What he does is 
“uncompute” the original entanglement and measure both qubits. Here is the 
whole system expressed as a quantum circuit, this time with a standard symbol 
for measurements at the end: 



To show the similarity to the analysis of Deutsch’s algorithm, we draw the cor¬ 
responding maze diagram for the circuit with a missing stage, and the diagrams 
for the four possible stages Alice can insert, in figures 8.3 and 8.4. 

In each case, two Phils congregate at one of the exit points, except in the 
iY case, when two Anti-Phils end at 11. Because the amplitude divisor is 2, 
this already entails that the Phils at the other exit points always cancel, but 
one may enjoy verifying this from the two figures. Hence, the measurement 
always gives the same exit point depending only on the operation Alice chose. 
The main point is that Alice’s four choices lead to four different results, so that 
Bob is able to tell what Alice did. 

Why might this be surprising? Bob has learned two bits of information as a 
result of the single qubit that Alice sent across the lake. This seems to say that 
the one qubit carried two classical bits of information. However, there was one 
previous connection between them—via the intermediary who gave them the 
entangled qubits to begin with. A result called Holevo’s theorem expresses 
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Figure 8.3 

Maze for superdense coding. 



Figure 8.4 

Maze stages for Pauli operators on qubit 1. 



the deep principle that a total transmission of n qubits can carry no more than 
n bits of classical information. Thus, there must always have been some prior 
interaction between them or their environments to produce the entanglements. 
Once they are in place, however, Alice can electively transmit information at 
a classically impossible two-for-one rate—at the cost of consuming entangle¬ 
ment resources for each pair of bits. This explains the name superdense coding. 
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Quantum teleportation involves three qubits, two initially owned by Alice 
and one by Bob. Alice and Bob share entangled qubits as before, whereas 
Alice’s other qubit is in an arbitrary (pure) state c = aeo + be \. Alice has 
no knowledge of this state, and hence cannot tell Bob how to prepare it, yet 
entirely by means of operations on her side of the lake, she can ensure that 
Bob can possess a qubit in the identical state. 

The following quantum circuit shows the operations, with c in the first quan¬ 
tum coordinate, Alice’s entangled qubit second, and Bob’s last. The circuit 
includes the Hadamard and CNOT gates used to entangle the latter two qubits. 



With this indexing, the start state is c <S> eoo. which equals ae ooo + ^too- After 
the first two gates, the state is 


c® — (eoo+en), 
\/2 


with Alice still in possession of the first coordinate of the entangled basis vec¬ 
tors. The point is that the rest of the circuit involves operations by Alice alone, 
including the measurements, all done on her side of the lake. This is different 
from using a two-qubit swap-gate to switch the c part to Bob, which would 
cross the lake. No quantum interference is involved, so a maze diagram helps 
visualize the results even with “arbitrary-phase Phils” lined up at the entrances 
for eooo an d c ioo as shown in figure 8.5. 

Because Bob’s qubit is the rightmost index, the measurement of Alice’s two 
qubits selects one of the four pairs of values divided off by the bars at the 
right. Each pair superposes to yield the value of Bob’s qubit after the two 
measurements “collapse” Alice’s part of the system. The final step is that Alice 
sends two classical bits across the lake to tell Bob what results she got, that is, 
which quadrant was selected by nature. The rest is in some sense the inverse 
of Alice’s step in the superdense coding: Bob uses the two bits to select one 
of the Pauli operations /, X , Z, i Y, respectively, and applies it to his qubit c' to 
restore it to Alice’s original value c. 
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Figure 8.5 

Maze for quantum teleportation. 



Neither Bob nor Alice is ever able to peek inside the qubit c to read the 
complex-number values of a and b or even get them right to more than a few 
uncertain bits of accuracy, amounting to at most one bit of solid information. 
This is already the essence of the natural law corresponding to Holevo’s the¬ 
orem. However, streams of qubits with prescribed values c can be generated, 
and experiments have shown that they can be received by Bob with high statis¬ 
tical fidelity over distances of many miles. This is still vastly far from the “Star 
Trek” dream of teleporting Alice across the lake, but already applications have 
sprung up to profit from this unexpected benefice of nature. 


8.4 Problems 

8.1. Prove that no classical algorithm can solve the problem in Deutsch’s algo¬ 
rithm with one evaluation of/ on Boolean arguments. 



8.5 Summary and Notes 
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8.2. Directly show that the vector a 2 in Deutsch’s algorithm is a unit vector. 

8.3. What happens if we apply the construction in Deutsch’s algorithm to one 
of the other three basis vectors? What other conclusions can we draw? 

8.4. Suppose Alice’s quantum cellphone provider charges c cents per classical 
bit sent by text, q cents per qubit sent by “quantext,” and e cents per entangled 
qubit. Work out the equations for superdense coding to be more cost-effective 
than just sending a classical text message. Then work out the conditions for 
teleportation to be cheaper (including the classical bits sent by text) than trans¬ 
mitting a qubit by quantext. 

8.5. Can you devise a four-qubit circuit—or maze diagram with sixteen rows— 
in which Alice and Bob hold two qubits each, and at the end Bob has two 
copies of c? Or can you do it with Alice and Bob having two entangled qubits, 
making three qubits for Alice and two for Bob overall, so that at the end Bob 
can do operations just on his two qubits to get his two copies of c? 


8.5 Summary and Notes 

The original algorithm of Deutsch (1985) was not exactly the one presented. 
It solved the same problem, but it did not get it exactly right. Rather it got a 
probabilistic advantage even with one evaluation. The form now ascribed to 
him was established in more general form by Deutsch and Jozsa (1992), and 
we turn to that next. 

Superdense coding originated with Bennett and Wiesner (1992), and quan¬ 
tum teleportation was discussed by Bennett, Brassard, Crepeau, Jozsa, Peres, 
and Wootters, (1993). Among many articles on experiments with teleporta¬ 
tion, we mention two in Nature : Bouwmeester et al. (1997) and Marcikic et al. 
(2003). Holevo’s theorem comes from Holevo (1973). One aspect is that when¬ 
ever H-vertex (undirected, simple) graphs G are encoded with fewer than (”) 
qubits, one per potential edge, the resulting quantum states «<-, cannot always 
hold full information about G. Encodings a a on (many) fewer qubits than 
edges can succeed only if the graphs G belong to families with regular structure 
or if the resultant smearing of information does not matter to approximation 
properties of the algorithm. 
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The Deutsch-Jozsa Algorithm 


The Deutsch-Jozsa algorithm operates on a Boolean function: 

/: { 0 , 1 }"{ 0 , 1 }. 

The goal is to tell apart the cases where the function is constant or balanced 
by performing only one evaluation of the function. Here a function is balanced 
if it has the same number of 1 ’s and 0’s as output. If neither case holds then 
the output is immaterial. Clearly this goal is impossible in the classical model 
of computation, even with as many as 2" _1 evaluations off on Boolean argu¬ 
ments. However, it is possible in the quantum model with just one evaluation, 
as we will see. 

Deutsch’s algorithm was important for being the first quantum algorithm, 
even though it only barely outperformed the classical one. The Deutsch-Jozsa 
algorithm shows that the improvement can be exponentially large. This is a 
huge advance over replacing two operations by one. 

The claim of an exponential improvement is of a quantum algorithm com¬ 
pared with a deterministic classical algorithm. In the worst case, a classical 
algorithm might have to make an exponential number of evaluations off before 
deciding whether it is balanced. However, a randomized algorithm could make 
this distinction in a constant number of evaluations provided we are happy to 
allow a small probability of making an error. Thus, this algorithm is another 
important step but still falls short of showing that quantum algorithms can be 
exponentially faster than classical algorithms if randomization is allowed for 
the latter. Still, the algorithm is important, and let’s start to look at it in detail. 


9.1 The Algorithm 

We will present the algorithm as computing a series of vectors «o,«i 
each which is in the real Hilbert space Hi x H 2 , where Hi has dimension 
N — 2" and H 2 has dimension 2. 

We index vectors in this space by xy, where x is n bits and y is a single bit. 

1. The initial vector is ao so that ao(0"l) = 1. That is, ao = eo"i- 

2. The next vector a\ is the result of applying the Hadamard transform on 
each H,- of the space with i = 1,2 separately. 

3. Then the vector «2 is the result of applying Uf where /'(xy) = x(f(x) © y). 

4. The final vector «3 is the result of applying the Hadamard transform again, 
but this time only to Hi. 
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9.2 The Analysis 


We can try to analyze this with Phil the mouse starting from 0"1, but this is 
where the failure of the mazes to scale starts to be felt. We can visualize that 
the initial Hadamard stages create a column that alternates Phil and Anti-Phil, 
much as in the analysis for Deutsch’s original algorithm in the last chapter. 
We can further visualize that the permutation matrices for the two constant 
functions leave this arrangement the same or swap places for every Phil and 
Anti-Phil, and in both cases, the mice amplify the two outcomes that begin 
with 0" and cancel the rest. 

The balanced cases are harder to visualize, however, at least for us. How 
clear is it that they all cancel all the amplitude at 0 " +1 and 0"1? Here is where 
we hand the lead over to our linear-algebra indexing notation, for which we 
again state the goal as a theorem: 

Theorem 9.1 A measurement of the vector a 3 will return 0"y, for some y, 
if and only if/ is a constant function. Thus, the Deutsch-Jozsa algorithm dis¬ 
tinguishes the cases off being constant or balanced using only one evaluation 
of Uf. 

Of course this theorem is the key: one measurement will work to tell whether 
/ is a constant function. The proof depends on the following lemma. Note how 
it follows the logic of the previous chapter, where N was 2. Here, N — 2". 


Lemma 9.2 The following are true: 

(1) For all x,y,a\(xy) = y= (~ 1 ) v . 

(2) For all x, y, a 2 {xy) = (~ 1 / (x)0 - v . 

(3) For all x,y, |« 3 (*y)| 2 = ^2 |Z,(-ir'(-l/ (,) | 2 - 

Proof. Let us prove (1). It is clear that applying the Hadamard gates indepen¬ 
dently yields 


« 1 (xy) = 


-T=£(-ir'(-ir%<«>, 


where we remind that x • t is the XOR-based inner product of Boolean strings, 
whereas y ■ u involves just single bits. Thus, by the definition of ao, 


«1 (xy) 


I 

V2 N 
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which equals -^=(-l) v . 

Let us next prove (2). By definition of the matrix U g it follows that 
a 2 {xy) = «i (x(f(x) © y)) = 

V2 N 

Let us finally prove (3). Again by definition of the Hadamard transform, 

«3(xy) = -j= ^(-l) xat a2(ty). 


We can factor out the common term (~ 1 ) y to get the desired probability: 


/ 2 1 

l«3(xy)l“ = 


^(-i) A '* f (-i/ (r) 


2 


□ 


Proof of Theorem 9.1. By lemma 9.2, |a3(0"y)| 2 is 


1 

2 N 1 


2 


Z(-i y (r) 


If/ is constant, then this expression is equal to 


1 

2iv 2 


2 


IH) 0 -' 


l 

2 ' 


Thus, the two equivalent cases y = 0,1 each have probability j, making it 
certain that the measurement yields 0"y. Iff is not constant, then it is equal to 


1 

lif- 


^(-i) 0,r (-i/ (r) 


2 

= 0. 


The sum ^ f ( _ l/^ is 0 because / is balanced, and so the measurement never 
yields 0 n y. □ 
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9.3 Problems 

9.1. What is the classical worst-case complexity of this problem? 

9.2. Why is y /(~1 equal to 0 when/ is balanced? 

9.3. Suppose that / is almost balanced, i.e., that the number of outputs of 0 
minus the number of outputs of 1 is o(2"). Can you still use the above algo¬ 
rithm? 


9.4 Summary and Notes 

This algorithm is of course due to Deutsch and Jozsa (1992). The clean demon¬ 
stration of quantum capability for a task whose direct classical analogue is 
infeasible in the worst case as n grows drew attention to the power of quan¬ 
tum computation in the years preceding Shor’s explosive discovery. The hunt 
for a task whose classical version can be posed as a function that lies outside 
classical randomized polynomial time led to Simon’s algorithm, which how¬ 
ever is likewise based on query access to a black-box function /. We gave a 
different angle on the Deutsch-Jozsa and Simon algorithms in two posts on 
the Godel’s Lost Letter blog titled “Quantum Chocolate Boxes” and “More 
Quantum Chocolate Boxes,” and respectively posted at: 

• http://rjlipton.wordpress.com/2011/10/26/quantum-chocolate-boxes/ 

• http://rjlipton.wordpress.com/2011/11/14/more-quantum-chocolate-boxes/ 

These posts raise the question of whether the classical analogue is limited in 
a way that is “unfair” for comparison to the algebraic resources enjoyed by the 
quantum algorithm. 





10 Simon’s Algorithm 


Daniel Simon’s algorithm detects a type of period in a Boolean function. Let 

/: { 0 , 1 }" -> { 0 , 1 }" 

be a Boolean function. We are promised that there is a “hidden vector” .v such 
that for all y and z, 

/(>’) =f(z) <=> y = z®s. (10.1) 

In case s is the all-zero vector, this means/ is bijective, whereas for all other s , 
the promise forces / to be two-to-one in a particularly simple way. In the latter 
case, we say / is periodic with “period” s. 

Simon’s beautiful theorem is that s can be found by a polynomial-time quan¬ 
tum algorithm. In particular, the algorithm distinguishes the case / is 1-to- 
1 from the particular cases where / is 2-to-l. This is the real breakthrough, 
because it beats even randomized classical algorithms. Thus, this algorithm is 
important in showing the power of quantum algorithms. The problem is artifi¬ 
cial and not itself important, but it showed the way. So let’s look at it in detail. 


10.1 The Algorithm 

Simon’s algorithm is different from the previous algorithms in that we need 
to run the quantum part many times to discover the value of s. Roughly, each 
“run” of the quantum routine gets more information about the value of s, and 
eventually we will be able to recover it. Happily, this recovery procedure is a 
classical algorithm and is quite simple: you just need to find a solution to a 
linear system. 

Initially, we have no equations for s. As the algorithm runs, it will accu¬ 
mulate more and more equations. Eventually it will have enough equations to 
solve and “find” s. 

The algorithm operates on vectors from Hjv x Hjv, where N — 2". As 
before, we view each vector as indexed by xy, except now both x and y are 
u-bit Boolean strings. The main body repeats until a classically verifiable con¬ 
dition is met. The condition is that a set E of linear equations has a unique 
solution. The set E uses the dot-product • of Boolean strings with addition 
modulo 2. 

1. Initialize E to the empty set. 

2. While E does not have a unique solution, do: 
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2.1 Define the initial vector a by 


a(xy) = 


■ t 
Vn 

0 


when y = f(x) 
otherwise. 


We showed how to build a feasibly given a box computing / in sec¬ 
tion 6.4. 

2.2 The next vector b is the result of applying the Hadamard transform to 
the “x” part of a , 

2.3 Measure b , which gives a concrete answer xy. 

2.4 Add the equation x • ,v = 0 to the set E of equations. 

3. Solve the equations to obtain a unique s. 

4. If s — 0" answer “f is bijective”; otherwise by the promise, we have found 
a nonzero s such that/(y) —f(z) whenever z — y © s, and we can output 
some such pair as witness to the answer “/ is not bijective.” 


10.2 The Analysis 

The analysis of the algorithm is based first on the observation that 

b(xy) = -J= 

This follows because it is the definition of applying the Hadamard transform 
to the first part of the space. 

Theorem 10.1 Given / and a hidden s satisfying the promise of equation 
(10.1), Simon’s algorithm finds s in polynomial expected time. 

The following main lemma is the key to understanding the algorithm. 

Lemma 10.2 Suppose that/ is periodic with nonzero s. Then the measured 
x’s are random Boolean strings in {0,1}" such that x • s — 0. 

Proof. In this case, / is two-to-one. Define R to be the set of y such that there 
is an x with f(x) — y, i.e., R is the range of the function/. Note that R contains 
exactly one-half of the possible y values. 
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If v is not in R, then h(xy) — 0, because no t makes a(ty ) nonzero. If y is in 
R. then there are two values zi and ’2 so that/(z,) = y for each i. Further, 


In this case, 


Z\ — Z2 © S. 


b(xy) = Zi +(-i)**fe®»)) 

= l/Al(-l) v,Zl (l + (-l) A,i ). 


This is either 0 or ±2 /N depending on whether or not x • s = 0. Thus, in this 
case, we have: 


b{xy) = 


±2/N 

0 


if v e R and x • s = 0; 
otherwise. 


The case where b(xy) is nonzero occurs exactly for half the x’s and for half the 
y’s. This is as it should be—otherwise, the norm of b would not be 1. 

Finally, it follows that any measurement yields xy with x a random Boolean 
string so that x • s = 0 as claimed. □ 


Proof of Theorem 10.1. By lemma 10.2, we accumulate random x so that 
x • s = 0. Because a random vector avoids even an (n — 1)-dimensional sub¬ 
space with probability at least one-half, the expected number of trials to obtain 
a full-rank system is below 2 n, and the probability of eventual success is over¬ 
whelming. If we are in the s — 0 case, then we will quickly find that out as 
well. The last step, on solving for a nonzero ,y, is to generate and verify the 
witness for/not being 1-to-l. □ 


Note, incidentally, that the classical part of the algorithm gives {0,1}" a 
vector-space structure, with bitwise XOR serving as vector addition modulo 
2. This contrasts with the quantum part of the algorithm using IV-dimensional 
space for its own reckonings. 


10.3 Problems 

10.1. What is the classical complexity of this problem if one can only evaluate 
/ as a black box? Even if we allow randomized algorithms? 

10.2. Show how to construct the start vector a from the elementary vector eo- 

10.3. Show that for any constant bit-vector s, x • s is a linear equation over the 
finite field Z 2 . 
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10.4. Show that b is a unit vector directly. 

10.5. Show that with probability at least 1/2, a set of m randomly selected 
Boolean /(-vectors is likely to span the whole space, provided m — Q (n log 2 n). 

10.6. Consider the integers modulo m — 2". For every k < n, the multiples of 
2 k are closed under addition, and hence they form an additive subgroup (in 
fact, an ideal ) H — of Z m . For any j < 2 k , the set Hj = {h +j \ h e H] is 
a coset of //, with Ho = H. 

Now say that/: Z m —» 7L m obeys the hidden-subgroup promise if for some 
k,f is constant on every coset of H <k f and its 2 k values on the cosets Hj are all 
distinct. Design a quantum algorithm to compute the value of k in n° { 11 time. 

10.7. Give a classical algorithm for the task of problem 10.6. Can you bound 
its running time by a polynomial in nl Consider k in the neighborhood of n/2. 

10.8. Note that / in problem 10.6 is periodic with period 2 k —that is, for all 
x < m, fix + 2 k ) = fix) (wrapping modulo m). Moreover, all values /(/z) for 
x < h < x + 2 k are distinct. Suppose instead that this holds with a number r 
in place of 2 k , where r is not a power of 2. Does your quantum algorithm in 
problem 10.6 still work? 


10.4 Summary and Notes 

Simon’s algorithm dates to 1992 and appeared in full in Simon (1997). It dis¬ 
tinguishes whether a Boolean function is one-to-one from the case of its being 
a special type of two-to-one. Note that/ can be two-to-one without having a 
period s. If there were a quantum algorithm that could distinguish one-to-one 
from two-to-one without any restrictions, then that would have consequences 
for the famous graph isomorphism problem. Namely, given two graphs, we can 
create functions that have hidden structure if and only if the graphs are isomor¬ 
phic. A version of Simon’s algorithm for this more general kind of periodicity 
would place the graph isomorphism problem into the class BQP, which we 
explore in chapter 16. 

As hinted in problem 10.6, the idea of Simon’s algorithm extends to a fairly 
wide range of so-called hidden subgroup problems. The situation is the same: 
we are given an oracle for an/ that takes distinct constant values on each coset 
of a subgroup H of a given group G. The basic idea of Simon’s algorithm works 
nicely when G is abelian and finds tough sledding when not. Indeed, if it works 
when G is the symmetric group, then graph isomorphism is solved in BQP. 




11 Shor’s Algorithm 


The centerpiece of Peter Shor’s algorithm detects a period in a function. Let 
/: N-> {0,1,.. ,,M — 1} 

be a feasibly computable function. We are promised that there is a period r , 
meaning that for all x, 

f(x+r) =f(x). 

The goal is to detect the period, i.e., to determine the value of r. Actually, we 
need more than this promise. We also need that the repeating values 


/(0),/(i),...,/o--i) 

are all distinct. Some call this latter condition “injectivity” or “bijectivity.” 
Possible relaxations of this condition are explored in the exercises, and overall 
its necessity and purpose are not fully understood. 

Shor’s beautiful result has many applications, including factoring integers. 
This is important because many researchers believe that factoring is far from 
being feasible for classical computation, and many popular crypto-systems and 
information-assurance applications base their security properties on this belief. 
In this chapter, we will discuss only the period-finding task; then in chapter 12, 
we will show how to use period-detection to factor integers. 


11.1 Strategy 

Shor’s algorithm, like Simon’s algorithm, has quantum and classical compo¬ 
nents that interlink. In its original form, which we present here, the quantum 
algorithm is a subroutine that is used to generate samples from an instance- 
specific distribution that seems hard to emulate classically. Theorem 16.2 will 
later allow making it into a “one-piece” quantum algorithm, but that is not how 
we think of it originally. Here is the overall strategy: 

1. Given an instance-specific n-bit integer M, use classical randomness to gen¬ 
erate an integer a between 1 and M — 1. First do gcd(a,M) to allow for 
the tiny chance that a already shares a factor with M, in which case we’re 
done. Otherwise we form the function f a (x) = a x mod M, which then has 
a period r that we wish to compute. All we know to begin with is that r 
divides M — 1. Many values of a and r are unhelpful, but with substantial 
probability, a will be chosen so that r is computed and yields a solution. 
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2. Use the classical feasibility of modular exponentiation via repeated squar¬ 
ing (shown in problem 2.9) to prepare the functional superposition of f a (x) 
over all x < M. 

3. Run the quantum part once and measure all qubits. The string formed by 
the first £ of them, where £ is about 2 log 2 M, yields a particular integer x in 
binary encoding. With substantial probability, x is “good” as defined below. 

4. Then classical computation is used to try to infer r from x. Either this suc¬ 
ceeds and we go to the next step or it is recognized that x was not good. In 
the latter case, we go back to step 3, running the quantum routine again. 

5. There is still a chance the value of r may be unsuitable—that is, that the 
original a was an unlucky choice. In this case, we must begin again in step 
1. But otherwise the value of r provides the only needed input to a final 
classical stage that yields a verifiable solution to the problem about M. 

There is a further important aspect of Shor’s algorithm, whose details go 
beyond the scope of this primer. Like many other sources, we have stinted a 
little on details about the quantum Fourier transform being feasible. In its lit¬ 
eral form, it involves angles in the complex plane that become exponentially 
small as n increases. We can create coarse approximations to these values using 
a few basic gates by a process hinted in the exercises of chapters 6 and 7. The 
ultimate game—not only in Shor’s full paper but in our understanding of the 
“quantum power” that allows feasible solution to problems like factoring that 
may be classically hard—is how these approximations interplay with the clas¬ 
sical techniques in stage 4. Note that stage 4 will also involve approximation. 
However, this concern was not part of Shor’s original brilliant insight—it is 
rather the “engineering afterward” in which there is still room for more dis¬ 
coveries. If we agree a priori that the quantum Fourier transform in its pure 
form is feasible, then what follows becomes a complete proof that factoring is 
likewise feasible. 


11.2 Good Numbers 

Let Q be a power of two, Q — 2 , such that M 2 < Q < 2M 2 . Say an integer 
x in the range 0,1, ..., Q — 1 is good provided there is an integer t relatively 
prime to the period r such that 

tQ — xr — k , where —r/2<k<r/2. (11.1) 




11.3 Quantum Part of the Algorithm 
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The whole idea of Shor is to break down the period detection into two parts, 
which are the two middle stages of the above strategy. That is, we use a quan¬ 
tum algorithm to generate values x that have an enhanced probability of being 
good. Then we use a classical algorithm to use the good x to find the period r. 
It helps first to know how many such x there are. 


Lemma 11.1 There are Q( log | ) good numbers. 

Proof. The key insight is to think of (11.1) as an equation modulo r. Then it 
becomes 

tQ = k mod r. 


where — r/2 < k < r/2. But as t varies from 0 to r — 1, the value of k can be 
arranged to be always in this range, so the only constraint on t is that it must 
be relatively prime to r. The number of values t that are relatively prime to r 
defines Euler’s totient function, which is denoted by (ffr). Note that for each 
value of t there is a different value of x, so counting t's is the same as counting 
x’s. Thus, the lemma reduces to a lower bound on Euler’s function. But it is 
known that 


f(z) - Q( 


z 

log log z 


)• 


Indeed, the constant in the Q approaches e 1 , where y — 0.5772156649 ... is 
the famous Euler-Mascheroni constant. In any event, this proves the lemma. 

□ 


If r is close to M, then by choosing Q close to M rather than M 2 , we would 
stand a good chance of finding a good x just by picking about log in any of 
them classically at random. However, this does not help when r is smaller. The 
genius of Shor’s algorithm is that the quantum Fourier transform can be used 
to drive amplitude toward good numbers in all cases. 


11.3 Quantum Part of the Algorithm 

Shor’s algorithm is like Simon’s algorithm. This should not be too surprising 
because it was based on Simon’s algorithm. The key difference is to use the 
Fourier transform Fq in place of the Hadamard transform. Recall that we have 
Q — 2 C and M 2 < Q < 2 M 2 . The reason that we have chosen Q so large is to 
ensure Q/r > M. 
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We view each vector as indexed by xy, where x and y are /’-bit Boolean 
strings. The above strategy controls this routine, which is like the inner part of 
Simon’s algorithm: 

1. The start vector a is the functional superposition off, i.e.. 


a(xy) = 


-JQ 

0 


when y = fix) 
otherwise. 


2. The next vector b is the result of applying Fq to the “x” part of a. 

3. Measure b , giving an answer xy from which we discard y. 

4. Exit into a classical routine that tests whether x is a good integer—if so, 
continue with the classical stages given later or else repeat from step 1. 

We remind readers that xr in (11.1) is ordinary numerical multiplication, 
whereas xy in vector indices is binary string concatenation. Although y is dis¬ 
carded, the injectivity condition ensures that for every good x, the superposition 
caused by Fq will contribute exactly r-many y’s. Together with lemma 11.1, 
this will give a little short of order-r 2 good pairs xy. Hence, it suffices to show 
that every good pair receives Q (|) of the amplitude, giving Q () in probabil¬ 
ity. The resulting Q ( i og } ogr ) probability of getting a good number on each trial 
will be large enough for the classical part to expect to succeed after relatively 
few trials of the quantum part. 


11.4 Analysis of the Quantum Part 

The intuition is that the QFT creates power series out of many angles />. Each 
series creates a large locus of points 0, f,2f, 3/3, ... For most angles /?, the 
locus spreads itself over the circle so that its average—which is obtained by 
summing the corresponding power series of complex numbers exp (ikfi )—is 
close to the origin. If f> is close to an integer multiple of 2k radians, however, 
then the angles all stay close to 0 modulo 2 n , and the average stays close to the 
complex number 1. These “good” j> embody multiples of the unknown period 
r, and so the process will distinguish those x that yield such /?. The way that r 
“pans out” like a nugget of gold is similar to what happens with s in Simon’s 
algorithm. 




11.4 Analysis of the Quantum Part 
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Much of the analysis can be done exactly before we take estimates to bound 
the amplitudes. It suffices to show that the good cases collectively grab a non¬ 
trivial fraction of the probability—we do not need estimates when f> is “bad” at 
all. Let us consider any pair xy where y is in the range off. With co — exp(=^), 
we have: 


b(xy) = 


1 2-1 

— ^ o> xu a(uy) = 


1 


Z a (“>0 

w:/(«)=y 


1 

Q 


Z 

KG /- 1 tv) 


The last ^ is not a typo—we have substituted the value of a(uy). Now take the 
first xq such that/Qo) = y. Then by injectivity, 

f~ l (y) — {x 0 , xo + r, xo + 2r. xq + 3r,...}. 

The cardinality of this set up to Q — 1 is T = 1 + ■ This brings out the 

finite geometric series and enables us to apply the formula for its sum: 


b(xy) = iZ^° +rt) 
^ k =0 



Note that when we take absolute values, the complex-phase factor o/'° will go 
away because it is a unit. We can multiply by further such units to make the 
numerator and denominator have real values even before we take the norms, 
using the trick that exp (if) — exp (—if) = 2sin(/?): 


b(xy) 


. xxQ—xrT /2 


,1 / co Txr/2 _ co -Txr/2 \ 

Q y mxr - 1 J 

Txr/ 2 


CO 


xxo—xrT/2-\-xr/2 


, , / 0J Txr / 2 — 0)~ Txr / 2 \ 

Q \ co *''! 2 - co~ xr l 2 J 


OJ x(x 0 +(T -1 )r) 1 ( sin(r ■ izxr/Q) \ 
Q \ sin (nxr/Q) ) 
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Note that we canceled a factor of 2 both inside and outside the angles. This 
finally tells us that 


\b(xy)\ 2 = -L 
Q 


sin 2 (r • nxr/Q) 
$\rr (n xr / Q) 


( 11 . 2 ) 


We will also use this equation in section 13.5. It looks strange that the right- 
hand side is independent of y, but recall that we did use the property that y is in 
the range off —without needing that y =f(x). By injectivity, we have r-many 
such y’s for any particular x. To finish the analysis, we need to show: 


1. When x is good, the right-hand side of (11.2) is relatively large. 

2. The total probability on good pairs xy is Q (), where n is the number of 
digits in M, which is high enough to give high probability of finding a good 
x in (9(logn)-many trials. 

3. If x is good, then in classical polynomial time we can determine the value 
of r. 


The second statement will follow quickly after the first, and we handle both 
in the next section. 


11.5 Probability of a Good Number 


We state a fact about sines that has its own interest. Note that 1.581 in radians 
is a little bit more than n /2 to leave some slack. We target the number 0.63247 
because its square is just above 0.4. 


Lemma 11.2 For all T > 0 and angles a > 0 such that Ta < 1.581, 


sin(Ta) 
sin (a) 


> 0.6324772 


Proof. The well-known identity sin(a) < a, which holds for all a > 0, makes 
it suffice to show that 


sin(7a) 

Ta 


> 0.63247. 


Consider the function s '" i (A) for 0 <x< 1.581. Its derivative has numerator 
xcos(x) — sin (x) and denominator a- 2 . For f < x < 1.581 the derivative is neg¬ 
ative since cos(x) is negative. For 0 < x < f it is also negative because the 
inequality .v < tan(x) holds there. Because its derivative is always negative in 
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this range, the function is minimized at the upper boundary x — 1.581, where 
it has value 


sin(l .581) 
L581 


0.9999479 

1.581 


> 0.63247. 


□ 


Lemma 11.3 For all pairs xy with x good, assuming M > 154, the probability 
of the measurement step outputting x is bounded below by 0.4r 2 . 


Proof. Recall that x being good means there is an integer t such that — 2 — 
tQ — xr < j, and that we have Q > Mr and 7=1 + [ — r A ° J, where xq < r. 
From above, using | sin(x)| = | sin(— x)\ — | sin(x + 7r)|, we have: 


\b(xy)\ 2 


1 

sin 2 (7- 

nxr/Q) 

G 2 

sin 2 {nxr/Q) 

l 

sin 2 (7- 


G 2 

sin 2 Or (g - 0) 

l 

sin 2 (7- 


G 2 

sin 2 (n 

xr-tQ , 

Q > 

l 

sin 2 (7- 

tQ—xr \ 

71 Q } 

G 2 

sin 2 ( n 

tQ-xr, 

Q ’ 


Now by goodness, the angle a — n '®q U is at most n fg. Because 7 < 1 + 
we have 


7r nr 

Ta < -1-. 

~~ 2 2 Q 


Q 


Because we chose Q > Mr, we have jg < xm < 1-581 — j using the condi¬ 
tion M > 154. Thus, Ta < 1.581, so as to meet the hypothesis of lemma 11.2. 
This gives us what we needed to hit our round-number probability target: 


\b(xy)\ 2 > 


1 

& 


(0.632477) 2 > 0.4r 2 . 


□ 


Corollary 11.4 The probability of getting a good number on each trial of 
the quantum part is £2( indeed at least l-in-log 2 log^ M. 

Proof. For every good x, there are r-many different y’s for which/ _1 (y) is a 
set of cardinality 7 in the analysis of section 11.4. Thus, by lemma 11.1, there 
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2 

are Q() pairs xy for which lemma 11.3 applies. A glance at the proof 
plus converting natural logs to logs base 2 makes the number of such pairs at 
least lo gff„g,,. • Thus, the total probability of getting a good number is at least 

2.53 1 

- > - . 

2.5 log 2 log 2 r log 2 log 2 M 

□ 

This finishes everything quantum in Shor’s algorithm—the remaining sec¬ 
tions of this chapter finish the entirely classical deduction of the exact period 
from a fairly small expected number of trials. 

As an aside, it is worth noting that the QFT analysis can also be conducted 
in a “lighter” fashion without needing the geometric-series formula and phase 
trick giving a ratio of sines for lemma 11.2. We need only go back to the first 
summation formula for the probability: 



Lemma 11.5 If is in the interval [0, tt], then sin((9) > 0. Also if 6 is in the 
smaller interval [zr/4,3^/4], then sin(0) > □ 

The reason that bounds on the sin function are especially important is that 
one way to prove that a complex number a + ib has a large absolute value is to 
show that its imaginary part b is large. Because this part of exp(zYi) is sin(71), it 
follows that understanding sines will play a role in our bounds. 

Lemma 11.6 Suppose 0 < xr mod Q < r/2 and j e {0,1,..., [Q/r\ — 1}. 
Then the imaginary part of 

(2nixrj\ 

is always non-negative, and for at least half of the values j, it is bounded below 
by a fixed constant. 

Proof. We know that 

r 

0 < xr mod O < - . 

~ 2 

Let t and k be such that 0 < k < ^ and 

r 

0 < xr — tQ = k < - . 

~ 2 
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We need to give a lower bound for 

exp(2 nixrj/Q). 

But xr = k + tQ, so this is equal to 

exp(2 Jii(k + tQ)j/Q), 
which by periodicity of exp is equal to 

exp(2 Kikj/Q). 

The imaginary part of the exponential function is sin(2n kj / Q). It is always 
non-negative because 2 kj < Q, and so the angle lies in the interval [0, /r ]. This 
proves that f is non-negative. 

Finally, provided j is bounded away from 0 and Q/r, it follows that each 
term contributes at least c > 0 for some absolute c. This proves the lemma. 

□ 

Thus, the entire sum is at least T multiplied by a constant c that is bounded 
away from 0, so 

\b(xy)\ 2 > ^c 2 T 2 > % , 

(J- r- 

whereupon the rest is similar to before. 

11.6 Using a Good Number 

Now we can finish the analysis of the inner classical routine in proving the 
third statement at the end of section 11.4. 

Lemma 11.7 If x is good, then in classical polynomial time, we can deter¬ 
mine the value of r. 

Proof. Recall that x being good means that there is a 1 relatively prime to r so 
that (by symmetry) 

r r 

xr — tO — k where -< k < - . 

2 ~ ~ 2 

Assume that k > 0; the argument is the same in the case it is negative. We can 
divide by rQ and get the equation 

x t 1 

Q~r -2Q- 
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We next claim that r and t are unique. Suppose there is another t'/r. Then 

t t' 1 1 

-> — > — 

r r' rr' M 2 

But then both fractions are close, which makes Q larger than M 2 , a contradic¬ 
tion. 

Because r is unique, it follows that t is too. So we can treat 

xr — tQ — k 

as an integer program in a fixed number of variables: the variables are r, f, and 
two slack variables used to state 

-r/2 <k<r/2 

as two equations. While integer programs are hard in general, for a fixed num¬ 
ber of variables, they are solvable in polynomial time. This proves the lemma. 

□ 

Usually this lemma is proved without recourse to integer programming. 
Instead, most sources use the special structure of the equation and argue that 
it can be solved by an elementary result in number theory. This comes from a 
classical problem in Diophantine approximation theory: given a fraction x/Q, 
find the best approximation to it with the denominator of a certain size. This 
is exactly what is needed here, and it can be done by technology based on 
continued fractions. 


11.7 Continued Fractions 


This section is optional. If you believe that an integer program in a fixed num¬ 
ber of variables is easy to solve, then there is no need to read on about this 
alternative method. Or if you know about continued fractions, then this is a 
repeat for you. However, if you wish to see how the approximation problem 
can be solved, then read away. 

Here is a classic continued fraction: 

1 

ci - 

1 

b-\ - 

1 

cH- 


d T * • 
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Given any real number a, the main result of the theory of continued fractions is 
that a series of fractions can be generated that get closer and closer to a. Indeed, 
if a is a rational number, then the sequence always terminates. Let y be the 
A-th such fraction. The process is called the continued fraction algorithm, 
and its analysis is conveyed by the following theorem statement. 

THEOREM 11.8 The fraction y can be generated in at most a polynomial 
number of basic arithmetic operations. Further, the distance from a to this 
fraction decreases exponentially fast. Also it is the best approximation to a in 

the following sense: if c a k 

l« - -I < |a - —I, 

then d > b k . k □ 


Let’s see whether this is enough to solve the approximation problem we 
face. Let a = Suppose we run the continued fraction algorithm until | is 
within 4r) °f u - We argue that b must equal r. Suppose that b < r. We know 

that ° , „ 1 



because both terms are close to a. Then it follows that 


I ar — bt I < — <1. 

Q 

This implies that b — r. Thus, by continued fractions, one can compute the 
period exactly with effort bounded by a polynomial in the number of bits in M. 


11.8 Problems 

11.1. Suppose you have a routine R that correctly computes the period r of any 
given function/: [A] —> [A] only when r is odd and works in O(logA) 2 time. 
Create a routine R' that works for even r as well. What is its running time? 

11.2. Now suppose that R outputs the correct r with probability (at least) 3/4, 
outputting “fail” otherwise. What running time must your R! have now? 

11.3. Show that if period detection is feasible for functions / that violate injec¬ 
tivity, then SAT can be solved by a polynomial time quantum algorithm. 

11.4. Suppose that / violates the injectivity assumption only by having one 
value appear twice. Does Shor’s algorithm still work? 
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11.5. In the alternate analysis at the end of section 11.5, what value of the 
constant c can you achieve? Compare it with the constant 0.4 obtained in 
lemma 11.3. 

11.6. For M = 21, how many values a relatively prime to M are there? Chart 
the periods r that you get for each one. 

11.7. This problem is for those who know continued fractions or wish to learn 
more. Show that the continued fractions generated in the theorem need not be 
the best approximation to the number. 


11.9 Summary and Notes 

Shor’s brilliant result appeared in the 1994 FOCS conference (Shor, 1994) and 
then in full journal form (Shor, 1997). It did more than almost anything else 
to create the excitement around the field of quantum algorithms. The ability 
to use it to break cryptographic systems captured the imagination of many 
researchers and the concern of many others. 

The algorithm has been generalized in modest ways. The first of us jointly 
showed (Boneh and Lipton, 1996) that the injectivity assumption could be 
relaxed to allow any value to occur a polynomial in n times. Later even stronger 
generalizations were found for period detection. The theorem that integer pro¬ 
grams with a fixed number of variables are polynomial-time solvable was 
proved by Lenstra (1983). 

The full article (Shor, 1997) cites work (Coppersmith, 1994, Griffiths and 
Niu, 1996) showing that the algorithm works even when the quantum Fourier 
transform is replaced by a fairly coarse approximation. Note that the QFT 
ostensibly requires a principal A'-th root con of unity for exponential-size N. 
No device is fine enough to carry out a rotation by con for large N, but these 
and other sources have shown that a series of larger steps, each of moderate 
precision, can achieve the same mathematical effect we proved using the exact 
QFT. There are, however, still extensive debates about the feasibility of all this 
in practice, which we have referenced in numerous posts on the Godel’s Lost 
Letter blog. The focus onM = 21 in the exercises, recalling section 1.4, comes 
because at this time of writing, 21 is the highest number for which a practical 
run of Shor’s algorithm has been claimed, although even this has been ques¬ 
tioned (Smolin et al., 2013). 




12 Factoring Integers 


In this chapter, we will present the most famous application of the period find¬ 
ing algorithm of Shor: the ability to factor integers in quantum polynomial 
time. The reduction of factoring to period discovery is really a nice example 
of computational number theory, one that was known a decade before Shor 
applied it. His genius was in the realization that he could compute periods fast 
via quantum algorithms. 


12.1 Some Basic Number Theory 

We need some standard facts and notation from elementary number theory. As 
usual x mod M is the residue of x modulo M, and x = y mod M means that x 
and y have the same residue modulo M. The greatest common divisor of x and 
y, written gcd(x,y), is the largest natural number that divides both x and y, and 
can be found via Euclid’s algorithm in time quadratic in the lengths of x and 
y. The numbers {x | gcd (x,M) — 1} form a group under multiplication modulo 
M. If gcd(x,y) = 1, then we say that x and y are relatively prime to each other. 
Every element x relatively prime to M has a finite smallest number £ so that 
x = 1 mod M. We will use ord i/Cr) to denote this number. 

If p is prime, then the nonzero numbers modulo p, that is, the numbers 
1 1, form a cyclic group under multiplication. Being cyclic means that 

they can all be written as powers of some element. One further important fact 
is the so-called Chinese remainder theorem: given distinct primes p, q and any 
elements x , y, 

x = y mod pq <=> x = y mod p and x = y mod q. 

We also need to define quadratic residues and state Euler’s criterion. It suf¬ 
fices to define them modulo an odd prime p. A number a, 1 < a < p — 1, is a 
quadratic residue (mod p) if there is an integer x such that x 1 is congruent to 
a modulo p. Euler’s criterion states that this is true if and only if 

p— 1 

a 2 =1 (mod p). 

For quadratic nonresidues, the right-hand side is - 1 modulo p. It is important 
again that by repeated squaring, a classical algorithm can compute the left- 
hand side in time polynomial in log/? (see problem 2.9). 

Another way to view what is going on is that the quadratic residues are the 
even powers of any generator of the cyclic group. Note that it is possible to 
decide whether a is an even power without needing to compute a generator g 
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or take logarithms to base g modulo p. The latter task, called the discrete log¬ 
arithm problem (see problem 12.5 below), is also solved by Shor’s algorithm, 
but not even quantum methods are known to be able to find a generator with 
high probability in polynomial time. 


12.2 Periods Give the Order 

Let’s turn now to the question of factoring the number M — pq where p and q 
are distinct odd primes. The general case works essentially in the same way, 
so it is reasonable to prove only this special case. Also this is the case of most 
interest to cryptography, so even if we could only do this case that would be 
important, but the methods easily can handle the case when M is divisible by 
many primes. 

Define the following function/ a (x) = (a x mod M). This function has several 
key properties. It is periodic because 

f(x + r) =f(x) 

for any x, where r' = (p — 1 )(q — 1). This follows from the fact that a p ~ l = 
1 mod p and a q ~ l = 1 mod q and from the Chinese remainder theorem. Thus, 
the function must have a minimal period r that divides r'. 

The r —not the r' —is the value that Shor’s algorithm returns. Even to get 
r, we must prove that the function’s values/ fl (0),... ,f a (r — 1) are distinct. By 
way of contradiction, suppose that 

fa (x) =fa(y), 

where 0<x<y<r — 1. By definition it follows that 
(a x mod M) = (a y mod M), 

so that a x = a y mod M. Thus, a y ~ x — 1 mod M, and it follows that there is a 
smaller period than r , which is a contradiction. 


12.3 Factoring 

Suppose that M — pq where p, q are odd primes; the general case is similar. 
There really is one theme that is used over and over in most methods of factor¬ 
ing: try to construct an integer x so that p divides x and q does not. Then the 
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value of gcd (x,M) will equal p. and M is factored. Of course the roles of p and 
q can be exchanged—the key is only that one prime divides x and the other 
does not. Note that this relies on the ability to compute the gcd of two integers 
in polynomial time. 

Let’s look at how M can be factored provided we come upon a multiple r of 
p — 1 or q — 1. Define A(r,p,q ) to be true when only one of p — 1 and q — 1 
divides r. Also define B(r,p, q) to be true when both divide r, and furthermore 

r r 


(P~ 1) 


and 


(q- 1 ) 


are both odd numbers. The rationale for these definitions is the following two 
lemmas: 


Lemma 12.1 There is a randomized algorithm A* ( r,M ) that factors M with 
probability at least one-half, provided A(r,p, q) is true. 

Proof. Assume that A (r.p. q) is true and p — 1 divides r. Pick a random a in 
1 — 1, and compute gcd(«' — 1, M). We claim that at least half the time 

this is a factor of M. Picking a by the Chinese remainder theorem is equivalent 
to picking a modulo each prime separately. Now a r = 1 mod p because a p ~ l = 
1 mod p and p — 1 divides r. So we need to show that 

a r = 1 mod q 

is true only at most half the time. Because q — 1 does not divide r, there is 
some b so that b r 1 mod q , using that Z* is a cyclic group of order q — 1. 
But then the set of b so that b r = 1 mod q is a proper subgroup, and so has at 
most half the elements modulo q. This proves the lemma. □ 

Lemma 12.2 There is a randomized algorithm B*(r.M) that factors M with 
probability at least one-half, provided B{r.p, q) is true. 

Proof. Assume that B(r.p.q) is true. Pick a random a in 1,... ,M — 1, and 
compute gcd (a' / 2 — 1 ,M). We claim that at least half the time this is a factor 
of M. Because B{r,p, q) is true, there is an £ so that (p — \){ = r and £ is odd 
and an m so that (q — 1 )m — r and m is also odd. Again picking a randomly 
means that at least half the time a will be a quadratic residue modulo one of 
the primes and a nonresidue modulo the other. So assume that a is a quadratic 
residue modulo p and a nonresidue modulo q. Then, 

a ( p -1 )/ 2 = 1 moc | p 

a (q-\)/2 _ -J mo ftq 
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by Euler’s criterion. The last step is to note that by definition of £ and m this is 
the same as: 


a (p-mt 

a (q-\)/2m 


1 mod p 
~ 1 mod q. 


Hence, we can apply the same reasoning as for lemma 12.1 to obtain a factor 
at least half the time. □ 


Thus, our goal is to find an r so that A(r,p.q) or B(r,p.q ) is true. We start 
with r so that p — 1 divides r. Define r, = r/2 l , and let k be so that is odd. We 
run A*(r,,M) for all i = 0,..., k. Then we mn B*(rk-\,M). If any try yields a 
factor, we are done. Otherwise, we try the process again. The final point is that 
this process works at least one-half the time. 

Let’s prove that. Initially, p — 1 divides r by assumption. Thus, q — 1 must 
also or A(r.p.q) is true. Assume that p — 1 and q — 1 both divide r, for all 
i — 0,... ,k — 1, and let at least one fail to divide ?>. If only one fails, then 
A(rk,p,q) is true, and we are done. So they both must fail to divide ip. Note 
that (p — 1 )£ = i for some £. The value of t must be odd because p — 1 
fails to divide ?>. In the same way, it follows that (q — 1 )m — i for some 
odd m. So it follows that B(r,p, q) is true. This will prove that we have at least 
a one-half chance to find a factor—which for M — pq will be p or q itself. 

Because it is simple to verify that the number we get divides M. we can do 
O(logn) = CHIog log M) trials of the entire algorithm to ensure success prob¬ 
ability at least 3/4. Further trials can amplify the success probability close to 
1, technically pushing the theoretical failure probability below 2“for any 
preset exponent c. Thus, everything in this and the previous chapter combines 
to prove: 

Theorem 12.3 Given any integer M, Shor’s algorithm finds a factor of M 
with high probability in quantum polynomial time. □ 


12.4 Problems 

12.1. Show that for any odd prime p. if x and y are both quadratic nonresidues 
modulo p. then xy is a quadratic residue modulo p. 

12.2. Suppose that a k = 1 mod p and // = 1 mod p for some odd prime p and 
some k , t. Can k and £ be relatively prime? 
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12.3. Show what the classical part of Shor’s algorithm does on M = 15 given 
r = 8. 

12.4. Show what the classical part does on M — 21 with r — 12. 

12.5. Let g be a generator of the cyclic group Z* of the integers 1./; — 

1 under multiplication modulo p. Define the discrete logarithm (base g) of 
any x < p to be the unique number r < p such that g r — x in Z*. Now given 
p,g,x with the task of finding the unknown r, define f x {a, b) — g a x~ h . Show a 
sense in which/ is periodic with period r, and find a relevant Abelian “hidden 
subgroup” of Z* x Z*. (If you are ambitious, go on to reprise the strategy of 
chapter 11 to compute r in quantum expected polynomial time, thus solving 
the discrete logarithm problem.) 


12.5 Summary and Notes 

That getting the order is enough to factor traces back at least to 1984 (Bach 
et ah, 1984). Our exposition of its theorem that given any multiple of/; — 1 it 
is possible to factor M also appeared in a post by us on the Godel’s Lost Letter 
blog: “A Lemma on Factoring,” http://rjlipton.wordpress.com/2011/12/10/a- 
lemma-on-factoring/. 





13 Grover’s Algorithm 


The problem that Grover’s algorithm solves is finding a “needle in a haystack.” 
Suppose that we have a large space of size N, and one of the elements is special. 
It may be a solution to some problem that we wish to solve—one we could 
verify if we knew it. Then a classical algorithm in worst case would have to 
examine all the elements, whereas even a randomized algorithm expects to 
look at N/2 elements. 

The power of quantum algorithms, as discovered by Lov Grover, is that 
N/2 can be improved to 0(N 12 ). Compared with a classical random-search 
algorithm, the factor 1 /2 goes into the exponent, which is a huge improvement. 
We will now explain how and why the algorithm works. 


13.1 Two Vectors 

Define the “hit vector” h by h (x) = 1 if x is the solution, or if x belongs to a 
possibly-larger set S of solutions, and h (x) = 0 otherwise. Provided the number 
k of solutions is nonzero, dividing by \fk makes h a unit vector, and hence a 
legal quantum state. If we could build this state, then measurement would yield 
a solution with certainty. 

The goal is to build a state h' close enough to It so that measuring h' will 
yield a solution with reasonable probability. The issue is that if we prepare a 
state a at random, then it is overwhelmingly likely to be far from li. Measuring 
a random a is like guessing aye [A] at random, and the probability k/N of its 
success is tiny unless k is huge. 

What Grover’s algorithm does is start with a particular vector j and jiggle 
it in a way that “attracts” it to h. How can we do this if we don’t know any¬ 
thing about h in advance? Actually, we do know something: all entries ofh 
that are indexed by solutions have the same value. Moreover, and arguably 
more important, the entries corresponding to nonsolutions also agree on their 
value. Call these two statements together the solution-smoothness property. 
Now the j we start with has all of its entries equal to 1 / */N, which guarantees 
solution-smoothness even though we have no idea where the solutions (and 
nonsolutions) are. By linearity, it follows that: 


Every vector a in the two-dimensional 
subspace spanned by h and j has the 
solution-smoothness property. 
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Having those equal entries, as we saw in section 5.5, enables the reflection 
of any such a about h to be computed via the Grover oracle. Actually, recall 
that if S is the set of which li is the characteristic vector, then reflection about 
h needs the Grover oracle of the complement of S. However, we can instead 
complement h in the following sense: Define the “miss vector” m to have entry 
1 /ffN — k for each nonsolution, 0 for each solution. Then m also belongs to 
the subspace, because 

m — (VNj — s/kh), 

and importantly, it is orthogonal to h. Now reflection about m uses the original 
Grover oracle Us , which we recall is defined by 


U s [x,x] = 


if x e S\ 
otherwise. 


with all off-diagonal entries zero. From theorem 5.6 in section 5.4, which was 
proved in section 6.5, we know that computing the Grover oracle is feasible. 
This is because testing whether a given x is in S is carried out by a feasible 
Boolean function fix), notwithstanding our difficulty of finding any x e S. This 
is how a quantum algorithm is able to avail itself of information about / 15 , 
information that is not as trivial as we might have supposed. 

We also know that reflection about j is a feasible unitary operation. These 
two reflection operations supply the “jiggle” that we need. Finally, what helps 
the analysis is that h and m form an orthogonal basis for H, and it will be 
convenient to describe the action geometrically with respect to this basis. 

This is our first foray outside the standard basis of any vector space, but 
having only two dimensions makes it easy. Think of m as the “x-axis” and It 
as the “y-axis,” and note that j is somewhere in the positive region between m 
and li. That is, j makes an angle a with m such that 0 < a < j. The cosine of 
a is given by the inner product of j with m, which depends only on k: 


(j, m) 


N-k 


N 


Put another way, sin 2 (a) = 4, which was just the success probability of ran¬ 
dom guessing. But after initializing a to j, if we can rotate a to make its angle 
6 with m to be close to j, then sin 2 (f?), which is always the success probabil¬ 
ity that measuring a gives a valid solution, will be close to 1. The algorithm 
achieves this by the geometrical principle that reflections around two different 
vectors yield a rotation. 




13.2 The Algorithm 


117 


13.2 The Algorithm 

We first state the algorithm supposing that the number k of solutions is known. 
Indeed, Grover originally presented his algorithm in the case where k is fore¬ 
known to be 1, that is, when there is a unique solution. Once we understand 
the mechanism, we will see what to do when k is not known in advance. 

1. Initialize the vector a to be the start vector j. 

2. Compute a = sin - 1 (y^) and 4 = L^J- 

3. Repeat the following Grover iteration 4 times: 

3.1 Apply Ref m to a via the Grover oracle Us, obtaining the vector a'. 

3.2 Apply Refj to a, obtaining the new value of a. 

4. Measure the final state a, giving a string x e {0,1}". 

5. If x e S stop—we have found a solution. Otherwise repeat the entire pro¬ 
cess. 

Tacit here is that if 4 < 1, then the inner loop falls through and we mea¬ 
sure right away. This happens only when is just below 1, i.e., a > j, so 
N/k > j. In this case, the measurement amounts to guessing uniformly at ran¬ 
dom, which then succeeds with probability at least 1 /2. We will show that this 
success probability also applies when the inner loop is run. Note that the value 
for 4 is the same as j- — ^ rounded to the nearest integer, and the — | part 
comes because we start with 9 = a not 9 = 0. 


13.3 The Analysis 

Let 9 be the angle between m and the current state a before any iteration of 
the inner loop. Suppose a < 9 < note that initially 9 = a. Because m is 
our x-axis, the Grover reflection puts a' at angle —9, which is stepping back to 
spring forward. Because its distance from j is now a + 9, the reflection about j 
doubles that and adds it to —9. The new angle is hence 


9' = -9 + 2a + 29 = 9 + 2a, 
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which means that the two reflections have effected a positive rotation by 2 a. 
The final angle 6 will hence lie within ±a of ^. Because a < j for any itera¬ 
tion to happen at all, the success probability sin 2 (4) is at least 1 /2. 

Finally, the inequality sin(x) < x means a > so 4 ~ ^ - 'J J- 

follows that even when k — 1 , we have proved: 


Theorem 13.1 Given a function/: {0,1}" -4 {0,1} from a feasible family, 
and given k — |S| where S = {x \ f(x) = 1}, Grover’s algorithm finds a member 
of 5 in an expected number 0(T ) of iterations, where T — 2 ( "~ log2 ^ /2 , and in 
overall time Tn°^\ □ 


13.4 The General Case, with k Unknown 

Now we consider what happens when k is not known in advance. If we operate 
as though k — 1 , then we might overshoot because the rotation amount 2 a 
depends on the actual value of k. We might be unlucky enough to land all the 
way on the other side of the circle at an angle near n or even back where we 
started, whereupon measuring would give success probability near zero. 

Suppose instead that we try to be cautious and measure after every f-th iter¬ 
ation. Because measurement collapses the system, we would have to restart. 
The expected time then becomes the sum of t from 1 to 4 , which is of order tr, 
and would exactly cancel the quadratic savings granted by the procedure when 
4 is known. One idea would be to try to save a copy each time before we mea¬ 
sure, but this could require preparing a huge number 4 of ancilla qubits and, 
insofar as maintaining them might involve the effort of keeping up a 4 x 4 
grid, could likewise cancel the time savings. 

The simplest of several known solutions is to choose the stopping time t for 
the iterations at random. The key is that the success probability p of measuring 
at the time a has angle 4 is given by sin 2 (4). This is like a sine wave except 
steeper and with period n not 2k staying in non-negative values. Nevertheless, 
if we throw a dart along the horizontal axis of a graph of sin 2 (4) to choose 4 at 
random, then there is exactly a 50-50 chance of it being more than 45 degrees 
away from the x-axis, which still gives sin 2 (4) > 1/2. We need only beware 
of choosing our range of dart values too narrowly when a is large, but we can 
guard against too-large a by making the classical guess-at-random step come 
first. The revised algorithm is: 
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1. Classically guess x e {0,1}" uniformly at random. If success, i.e., x e S, 

stop. 

2. Initialize the vector a to be the start vector j. 

3. Randomly select a number t such that 1 < t < VN/2. 

4. Repeat t times: 

4.1 Apply Ref m to a via the Grover oracle Us, obtaining the vector a'. 

4.2 Apply Ref, to a', obtaining the new value of a. 

5. Measure the final state a, giving a string x e {0,1}". 

6 . If x e S stop—we have found a solution. Otherwise repeat the entire pro¬ 
cess. 

The analysis is straightforward. The first step succeeds with probability at 
least unless a < ^ because the sine of 30 degrees is 5 . Otherwise, one need 
only show that regardless of a such that 1 /*JN < a < T, a random rotation 
by t ■ 2a has at least a 4 — e chance of falling within j of the v-axis, giving 
overall success probability at least 4(4 — f ), where c is tiny in absolute terms. 
(The exercises ask how to arrange rigorously to get the probability over 24% 
for all n and possible values of k.) 


13.5 Grover Approximate Counting 

We can blend Grover’s search with Shor’s algorithm to estimate the number 
k of solutions. This is equivalent to estimating the angle of rotation 2a. For 
intuition, let us first suppose 2a = 2n /r for some integer r. Then the function 
fit ) = siir(2t«) is periodic with period r. Hence, we can apply Shor’s algo¬ 
rithm to find the period r, which in this case would tell us k exactly, and the 
application adds only n 0 ^ — (log A0 O(1) time. That/(f) is not injective is OK 
because it is at worst 4-to-l, and the case where 0 does not divide the circle 
evenly will still leave us able to approximate r and then estimate k. The issue 
is that if we followed chapter 11 literally, it would require first computing the 
functionally superposed state of/(f) over all t, but at first it seems hard how to 
get f{t) without sampling from repeated measurements. 

The answer is that the quantum state of Grover’s algorithm after 1 iterations, 
superposed over all integers t up to at most */N, already contains enough of 
the right kind of information to make Shor’s algorithm work. The QFT will 
amplify the results for those t that are close to the optimal iteration number 
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7 ~ y/WJk. That is, if we measure the qubits that started off holding values of 
t, most of the amplitude will have been attracted to those t that are close to 7. 
It will help if we can avoid “overshooting” by preventing t > 27, which could 
cancel good results. We will structure the algorithm to provide a good chance 
of success before this could happen. 

This conveys the essential idea, and at this point, it is fine to skip ahead to the 
next chapter on quantum random walks. The rest of this section is to make good 
on our intent to provide full details. It also exemplifies a phase-estimation task 
that is substantial but easier than the one needed for theorem 15.2 in chapter 15, 
where we do skip the proof details. 

Our goal is to estimate k to within a factor of (1 + e), and we will succeed 
on pain of multiplying the expected time by 1/e. Setting e to be any inverse 
polynomial in n multiplies the time by only a polynomial in log /V, for overall 
time still 0(s/N). By the same measure, we can afford restarting Grover’s 
algorithm log 2 VN times, each time guessing for the true 7 to be double what 
we tried before. Note that when t ~ 2 C <£ 7, this means we are guessing a k 
that is much higher than the true value, and then with high probability the 
Shor-based counting routine will say “zero”—whereupon we increment f and 
try again. 

For our estimates, we will need finer trigonometric analysis than we have 
used before. Because this is an advanced section, we refer proofs of the fol¬ 
lowing two inequalities to general sources. 

Lemma 13.2 For any M > 1, 6 > 0, and angles 0, a,/? with 0 < M6 < ^ 
and \a — j3\ < 8: 

sin (MO) > Msm(0)cos(M0). (13.1) 

| sin 2 (a) — sin 2 (/?)| < 28\ sin(«) cos(a)| + 8 2 . (13.2) 

The distinctive feature of this algorithm is a superposition over t, using 
{0, l} f to code [0, 1 1 — 1], We will begin with a 1/-/^-weighted superposi¬ 
tion of states of the form 

G ®j n > 

where on each one we intend t steps of Grover iteration starting with the j n 
part. The issue—explored in problem 13.9 below—is that carrying along a 
dependent value in the first { quantum coordinates upsets the geometry of the 
rotations. This applies not for the reflection about m , which becomes just a 
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sign flip, but for the one about j n itself. That is, the two-dimensional space has 
to choose a fixed ‘y'-vector,” so it does projections based on f = e () i ® j n . To 
use/, we must at each step undo the transformation used to create the kind of 
initial superposition over f, reflect about f, and then redo the transformation 
to re-create the functional superposition needed for the Grover oracle call to 
work on each superposed track where the iteration is still active. Because the 
QFT gives the same result as the Hadamard transform on the all-zero basis 
state, we can use the QFT as a partial transform on the first { qubits to create 
the superposition. The iteration operator for M — 2 C is 

Q = F M Ref f F M 'Ref m . 

Happily, the extra applications of Fm affect only the polylog-in-A' factors in 
the cost. Now we need to iterate Q a different number of times for each t. This 
gives rise to the operation 

Stagger (2t \Q)(e, ®j n ) = Q'(e {f ®j n ). 

This looks hard to perform, but the control tricks in chapter 6 show how. 
Stagger (M) can be coded by treating the first log 2 M quantum coordinates as 
a counter initialized to one of the superposed t, which gets decremented with 
each iteration. An iteration uses controlled gates to perform Q conditioned on 
the counter not being zero. Everything works in superposition without making 
more than M Grover oracle calls overall. 

The last detail is how to choose M. As with Grover’s original algorithm 
in the case where k is known, we want to avoid doing too many iterations, 
but here the motive is different. Because we are superposing over all t < M, 
overshooting is not the main issue. We will show in the proof that once M 
is above a certain threshold, the value of M does not matter much to either 
the quality or likelihood of the estimate obtained. Remarkably, with probabil¬ 
ity over \ >0.81, the measurement will yield one of the two integers that 
flank the optimal fractional value. Instead, the motive is just to minimize the 
number of queries and the running time, keeping M — OQ ). Because our k 
is unknown—indeed, k is exactly what we are trying to estimate—we do not 
know this threshold in advance, but we can “probe” for it by restarting with 
different values for l until we are close enough that the returned estimate for k 
is nonzero. Then the final value of ( gives enough guidance on how far to jump 
M ahead for the final run. 
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13.5.1 The Algorithm 

As in section 13.4, we can preface the algorithm with classical random sam¬ 
pling to catch the case of 4 greater than | or some smaller constant. Assuming 
the samples are all “misses,” we proceed to the quantum part. 

1. Choose e based on n,N, and initialize £ — 1, M = 2 C . 

2. Apply Fm to the first part of the start vector a o = e () t ® j n to get a. 

3. While (M < VN) do: 

3.1 Compute a' = Stagger (M \0)a; 

3.2 Apply F M l once more to the first £ qubits to make a"\ 

3.3 Measure a", reading the result v on the first £ qubits as a number in 
0...M- 1; 

3.4 Ifv > 0, then break: otherwise do £ — £ -{- 1, At — 2^, a — no. and begin 
the next while-loop iteration. 

4. Using the last value £ in the loop, set M to be the smallest power of 2 above 

20;r 2 of 
e z • 

5. Form uq and apply Fm as a partial transform on the first £' = log 2 M qubits 
to get a. 

6 . Repeat the while loop once through and measure to get the final value v. 

7. Round N siir (zr j-j) to an integer—not caring up or down if the fractional 
part is near 0.5—and output it as the estimate k’ for k. 


13.5.2 The Analysis 

First note that if k — 0, then m and j coincide, so that the sign-flip action com¬ 
mutes with Fm- This causes everything to cancel, leaving a" = a o, whose mea¬ 
surement in the first £ qubits always gives v = 0. The only pain is that this takes 
the maximum while-loop time to find out, and in particular it makes the full 
budget of about 2^/N Grover oracle queries. For k > 0, the time is tighter: 


Theorem 13.3 If k > 0, then with probability at least 2/3, the algorithm 
outputs k! such that \k! — k | < ek , while using O(T) evaluations and 0(T) time 
overall, where 


T 


1 [n 
7V I' 
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Proof. With reference to section 13.3, let a be the angle in radians between 
j and m , and set m — log 2 (^ ) • We first claim that with probability at least 
cos 2 (2/5), the while-loop gives 0 for all £ < m. We will then show that on the 
last stage with £ = m + 1, we get a good nonzero value with probability at least 
8 /tt 2 . The conclusion will follow because cos 2 (0.4) = 0.848 ... and 8 /tt 2 = 
0.81..., with product > 2/3. We will not need to consider the eventuality that 
the while-loop bound V7v is exceeded. Let 

1 M—1 


Then F«c ( = sm(jj). It follows that if b is an integer multiple x of y r then 
measuring Ff^SMib) recovers e x with certainty, even for x = 0. Moreover, for 
any b and x, the chance of obtaining e x by measuring Ff l s\i(b) is 


Px 


\(e x , F M 's M (b))\ 2 
\(F* M e x ,s M (b))\ 2 

I 2 


M -1 


- y, 

M I t—* 


• 2n 'My f 


M 

l 

M 2 


y=0 
M— 1 

y=0 


2 k idy 


(M -1 

z 

,.v=0 


e 2niby e v 


2 


writing d — \jf — b\. When Stagger (Q) is applied to ( FMe {) r) <8 )j n , we 
have x = 0 and b = —, so d = —. Thus, by the derivation of (11.2) in sec- 

71 71 J ' 

tion 11.5, we finally obtain 


1 sin 2 {Mtz d) 

M 2 sin 2 (7rc/) 

which is the chance of the measurement giving e o, that is, 0. Accordingly, the 
probability of getting the first m trials zero is 


m 

/?o=n 


(=1 


s\vr(2 e a) 
2 2i sin 2 (a) 
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Because 2 1 a < 2 m a < i < f- satisfies the hypothesis of inequality (13.1) in 
lemma 13.2, we can apply sin(Ma) > M sin (a) cos (Ma) to yield 

FT 2/of 1 sin 2 (2 m+1 «) 

P0>TI"» (2 a)= - rr^TT ■ 

/= I 2- m sin (2a) 

Applying the inequality again with 2'” in place of M and 2a in place of a 
makes 


2 2 "' sin 2 (2 a) cos 2 (2 m+1 a) 
2 2 " 1 sin 2 (2a) 


= cos~(2'" +1 a) = cos 2 (-), 


by the choice of m. This completes the first goal. 

For the second goal, given that the first goal has succeeded, we note that 
because the loop has ( = m + 1 on the first nonzero value, the inequality 
2 m+1 > J- gives 

20s- 2 +1 4 n 2 

e ea 

Using that a < ^ sin(a) = «Jk/N , this further yields 

87 t 8 n^/WJk 

M > -= — 4 --^-. 

e sin(a) e 

Let v be such that ^ = sin 2 (n ^). For a result v' that we get, we are interested 
in | sin 2 (zr ) — sin 2 (7r ^)|. When v is one of the two flanking integers of v, the 
difference in the angles will be at most 8 — jj. By (13.2) in lemma 13.2, we 
will have: 

| sin 2 (7r — sin 2 (7r -^- )| < 25\ sin(7T cos(tt + S 2 


Substituting for 8 gives: 

o ^/k{N - k) k 2 
NM + M 2 


= 2ck/sin“(7 r—)(1 — shr(7 T—)) + o 

V M M 


= 28- 


—r < 2 n 


/k(N — k) 2 
N 


/k{N - k) 


N-%njN]k/e 64tt 2 N/ke 2 


4N\~ ^ 


< — k+— k. 
~ 4N 64 N 
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Hence, the additive error in the estimate N sin 2 (n A) before rounding is at most 
k . This is small enough that the k' obtained after rounding gives 

\k-k'\ < ek. 


which is what we need to prove. Hence, we have done everything except lower- 
bound the probability of getting the measured v to be one of the two flanking 
integers of v. 

To do this, we return to the geometry of what is happening in our product 
Hilbert space of the first t' qubits and the space spanned by the original Grover 
hit and miss vectors li and m on the final run-through from step 5. The vector 
a o in this representation is 


a Q = —c 0 ® (e m h - e la m). 

V2 


After the application of Fm as a partial transform on the first part, and ignoring 
the global phase factor that came from the rotated basis, we have 


M -1 




(e m h - e~ ,u m). 


After the differential numbers of Grover rotations given by Stagger [M \Q), 
we have: 


M -1 


= ^T e y®( eKa+2ya)h 


-i(a+2ya) 


m) 


y=o 

M -1 


M— 1 


x/2 M 


Y,e i2ya (e y ®h)- 


y=0 


s/2M 


Y,e~ l2ya (e y ®m) 


O. w 

V2 n V2 


y= 0 
) <8 )tn. 


Hence, what we measure in the end after applying the final F M l on the 
first space is an evenly weighted mix of measuring either F~^sm(-) or 
Now the amplitude the latter contributes to e y equals what the 
former contributes to eu-y-, an d because 

2 M — y i y 

sinG^-) = sin“(7r —), 

M M 
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the overall probability of obtaining y is the same as what we get from measur¬ 
ing If y were an integer multiple y of 1/M, then by the Fourier 

analysis at the start of this proof, we would obtain y with certainty. As it is, 
we need only lower-bound the probability of getting y such that I ^ — — I < ■ 

Let D — Mj — IMj\ , so 0 < D < 1, and d = ^. Then we have that the prob¬ 
ability of the measurement y producing either the integer above or the integer 
below the target v is 

sin 2 (Mnd ) sin 2 (Mir (jj - d)) 

M 2 sin 2 (iz d) M 2 sin 2 (n (^ — d j) 

This attains its minimum for d = uniquely when M > 2, whereupon it 
becomes 

2 Sin2(|) 2 1 ( 2M ) 2 _ 8 
M 2 sin 2 (^) M 2 7T iz 2 

Thus, with at least this probability, the value y — v returned by the measure¬ 
ment gives \n — a\ < and it follows that the estimate k' =Nsi\t(tz is 
within ek of the true value k. □ 

The analysis still allows bad values with probability slightly less than 
1/3. However, the displacement of success away from 1/2 implies that with 
repeated trials, there will be clustering around a unique correct value, and aver¬ 
aging the cluster (while discriminating away outliers) will produce an estimate 
with higher confidence. 


13.6 Problems 

13.1. Calculate the 2x2 matrix of the action of one iteration RefjRef,,, in the 
/i ,m basis. Here the first dimension holds the aggregate value of the fc-many 
coordinates that are hits, and the other holds the value of the N — k misses. 
Note that the hit vector li becomes (1,0), and m becomes (0,1). 

13.2. With reference to problem 13.1, what does j become as a unit vector in 
the h,m basis? What does the action look like when the 2x2 matrix is given 
with respect to the (nonorthogonal) basis formed by h and j instead? 

13.3. For what initial value of 9 does Grover’s algorithm guarantee finding a 
solution (100% probability) with a measurement after one iteration? Find the 
corresponding value of k. 
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13.4. What happens if you do another Grover iteration when 0 = \ (i.e., 45°) 
rather than stop as the algorithm indicates? 

13.5. Calculate the exact number of Grover iterations when N — 2 8 = 256 and 
k — 4. Is it the same as when N — 64 and k — 1 ? What is the success probabil¬ 
ity of one trial in each case? 

13.6. Complete the analysis of the general case in section 13.4, choosing t 
and bounding « more carefully to make it work with e — 0.01, giving success 
probability at least 0.24 in each trial. 

13.7. Show how to use Grover’s algorithm to decide whether an n-vertex graph 
has a triangle in quantum time about 0(rr^~). 

13.8. Consider the following alternative to the strategy of section 13.4 for the 
case where the number k of solutions is unknown. For some integer c > 0, first 
do c + 1 trials for t — 1 to c + 1 in which measurement is done after t iterations, 
restarting the whole process if a solution is not found. Then obtain the next f 
by multiplying the old t by (1 + i) and rounding up so the sequence continues 
c + 3, c + 5, ..., 2c + 1, 2c + 4, ..., eventually growing exponentially. Give 
an estimate for the expected running time as a function of k and n. What value 
of c minimizes your estimate? 

13.9. Suppose we want to execute Grover search with functionally superposed 
states. That is, for some (classically feasible) function g , whenever the basic 
Grover algorithm is in a state ^ v a x e x , our algorithm will need to be in the 
state 

y.fl*l*)lg(*)) = ^a x e x ®e g ( x y 

X X 

Show that the algorithm works unchanged provided g(x) is a constant function. 

13.10. When g is not constant, does the idea in problem 13.9 work? 

13.11. Suppose S(n) is the time to compute 

C(e v o r ") = 6\g(x) 

whether on a single basic input x or a superposed one. Show that we can mod¬ 
ify the Grover iteration to work with the functional states, on pain of 2 S(n) 
becoming an extra multiplicative factor on the time. (Replicating this “setup 
time” for the functional superposition was OK in section 13.5 but will not be 
in chapter 15.) 
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13.7 Summary and Notes 

Grover’s algorithm appeared in the 1996 STOC conference (Grover, 1996) and 
the next year in full journal form (Grover, 1997). Originally, it assumed that 
there is only one solution. Later it was realized that the idea works for any num¬ 
ber of solutions. Unlike the case of Shor’s algorithm, there has been relatively 
little practical argument against the assertion that the functionally superposed 
states -jjj \x)\f(x)) are feasible to prepare, so that the oracle operation Uf 
is feasible. It has been noted, however—see discussion in the texts by Hirven- 
salo (2010) and Rieffel and Polak (2011)—that for many particular functions 
/, similar speedup can be obtained by classical means. This leaves the question 
of building Uf for functions/ for which sharp lower bounds on classical search 
might be provable. 

Another question, especially when k is known, is why can’t we jump ahead 
by computing the right number of iterations in one sweep? The answer is 
that computation apart from Uf can gain only no or little information about 
the angle of the hit vector, whereas Uf provides only the given small angle 
on each call. It has been proven rigorously in various ways—see Bennett 
et al. (1997) and Beals et al. (2001)—that L'l(Jj) calls to Uf are neces¬ 
sary, making this an asymptotically tight bound up to constant factors. Our 
long section on approximate counting follows the main article (Brassard et al., 
2000), which gives full details over the earlier conference version (Brassard 
et al., 1998). The former article also gives more general applications, trade¬ 
offs between success probability and closeness of the estimate, and some 
results with nontrivial exact counting. Problem 13.8 follows lecture notes 
(https://cs.uwaterloo.ca/~watrous/LectureNotes.html) by John Watrous. 

Grover’s algorithm is greatly important for two reasons. First, it gives only a 
polynomial speedup: a search of cost T roughly becomes a search of cost \ff. 
But it is completely general. This ability to speed up almost any kind of search 
has led to a large amount of research, even more than what we exemplify in 
chapters 14 and 15. Perhaps if quantum computers one day are real, Grover’s 
algorithm may be used in many ways in practice. 

Second, it showed that there were other types of algorithms. All the previ¬ 
ous algorithms that we have discussed had just a few steps and/or turned on 
an immediate property of the Hadamard or Fourier transform. Grover’s algo¬ 
rithm did the most to break this form and pointed the way to more intricate 
algorithms when combined with quantum walks, which we cover next. 




Quantum Walks 


This chapter gives a self-contained and elementary presentation of quantum 
walks, needing only previous coverage in this text of graph theory and the sum- 
over-paths behavior. In the next chapter, where we cover search algorithms 
using quantum walks, the material is necessarily more advanced, and we have 
chosen to emphasize the intuition at some expense of detail. 

Both quantum and classical random walks can be visualized as walks on 
graphs. The graphs may be finite or infinite, directed or undirected. We will 
work toward a bird’s-eye view of a quantum walk as a deterministic process 
(before any measurement) and will follow recent usage of excising the word 
“random” in the quantum case. First we consider the classical case. 


14.1 Classical Random Walks 

Classical random walks on graphs are a fundamental topic in computational 
theory. The idea of a walk is easy to picture. Suppose you are at a node u e V, 
and suppose there are edges out of u to neighbors vi,..., 17 /• In the standard 
random walk, you pick a neighbor 17 at random, that is, with probability 1 /cl. 
In a general random walk, there is a specified probability p u ^> Vi for each neigh¬ 
bor v,-. Either way, if v,- is chosen, then you go there by setting u = 17 and repeat 
this process. There are three main questions about a classical random walk: 

1. Given a node w different from u, what is the expectation for the number of 
steps to reach w starting from m? 

2. How many steps are expected for the walk to visit all nodes w in the graph, 
in case n=\V\ is finite? 

3. If you stop the walk after a given number t of steps, what is the probability 
Pt(w) of ending at node wl How does it behave as t gets large? 

The questions can have dramatically different answers depending on whet¬ 
her G is directed or undirected. To see this, first consider the undirected graph 
in which the vertices stand for integers ;, and each i is connected to i — 1 and 
i+ 1. If we start at 0, then what is the expected number of steps to reach node 
n? Each step is a coin flip—heads you move right, tails you move left. Hence, 
reaching cell n means sometimes having an excess of n more heads than tails. 
Now the standard deviation of A'-many coin flips is proportional to . and it 
follows that the expected time to have a positive deviation of n is 0(n 2 ). 

This result carries over to any undirected n-vertex graph. If node y is reach¬ 
able at all from node x, then there is a path from x to y of length at most n — 1. 
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It is possible that some node u along this path may have degree d > 3 with 
d — 1 of the neighbors farther away from y, so that the chance of immediate 
progress is only 1 /d. However, this entails that the original distance from x to 
y was at most n — d + 1. Thus, any graph structure richer than the simple path 
trades against the length, and it can be shown that the 0{n 2 ) step expectation 
of the simple path remains the worst case to reach any given node in the same 
connected component as x. In particular, this yields an economical randomized 
algorithm to tell whether an undirected graph is connected by taking a walk of 
0(rr) steps and tracking the different nodes encountered. 

For directed graphs, however, the time can be exponential. Consider directed 
graphs G„ with V — {0,..., n — 1} and edges i + 1) and (;, 0) for each i. The 
walk starts at u — 0 and has goal node y = n— 1, which we may suppose has 
both out-edges going to 0. Now a “tail” sends the walk all the way back to 
0 , so the event of reaching y is the same as getting n — 1 consecutive heads. 
The expected time for this is proportional to 2". Thus, mazes with one-way 
corridors are harder to traverse than the familiar kind with undirected corridors. 

There are two main further insights on the road to quantum walks. The first 
is that quite apart from how directedness can make locations difficult to reach 
with high probability, it is possible to cancel the probability of being in certain 
locations at certain times altogether. The second is like the difference between 
AC and DC electricity. Instead of seeing a walk as “going somewhere” like a 
direct current, it is better to view it as a dance back and forth on the vertices 
according to some eventually realized distribution. Both insights require rep¬ 
resenting walks in terms of actions by matrices, and again we can get much 
initial mileage from the classical case. 


14.2 Random Walks and Matrices 

Classical random walks on graphs G — (V,E) can be specified by matrices A 
whose rows and columns correspond to nodes u,v. Here A is like the adja¬ 
cency matrix of G, in that A[u, v] ^ 0 only if there is an edge from u to v in 
G, but the entries on edges are probabilities. Namely, A[u,v] = which 

denotes the probability of going next to v if the “walker” is at u. The matrix A 
is row-stochastic as defined in section 3.5; that is, the values in each row are 
nonnegative and sum to 1. 
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It follows that fir is also a row-stochastic matrix and gives the probabilities 
of pairs of steps at a time. That is, for any nodes u and w, 


A 2 [u,w] = ^ A[u, v]A[v, w] = ^ 


Pu —> vPv —> W' 


Because the events of the walk going from u to different nodes v are mutually 
exclusive and collectively exhaustive, this sum indeed gives the probability of 
going from u to w in two steps. The same goes for A and paths of three steps, 
and so on for A 1 , all k > 0. 

A probability distribution D on the nodes of G is stable under A if for all 
nodes v, 

D(v) = Z D(u)A(u,v). 

U 

Intuitively, this says that if Diy) is the probability of finding a missing 
parachute jumper at any location v, then the probability is the same even if 
the jumper has had time to do a random step according to A after landing. 
Mathematically, this says that D is an eigenvector of A , with eigenvalue 1; the 
eigenvector is on the left, giving DA — D. 

If G is a connected, finite, undirected graph that is not bipartite, there is an 
integer k such that for all £ > k and all x,y e V, there is a path of exactly t steps 
from v to y. It follows that for for any matrix A defining a random walk on G, 
all entries of A k and all higher powers are nonzero. It then further follows—this 
is a hard theorem—that the powers of A converge pointwise to a matrix A* that 
projects onto some stationary distribution. That is, for any initial distribution 

C, CA* — D, and moreover the sequence C* = CA k converges pointwise to 

D. This goes even for the distribution C(n) = 1, C(v) = 0 for all v ^ n, which 
represents our random-traveler initially on node u. 

When A is the standard random walk, the limiting probability is 


Dili) = 


degiu) 

m • 


Nonuniform walks A may have other limiting probabilities, but they still have 
the remarkable property that any initial distribution is converged pointwise 
to D. The relation between e > 0 and the power k needed to ensure 11 CA 1 — 
D\\ < c for all ( > k, where 11 • 11 is the max-norm, is called the mixing time 
of A , while the k giving max,, ,. \ D(v) — A 1 [«, v]| < e for all C > k is called the 
hitting time. 

If G is bipartite, there is still a stationary D, but not all C will be carried 
toward it—any distribution with support confined to one of the two partitions 
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will alternate between the partitions. When G is directed, similar behavior 
occurs with period 3 in a directed triangle, and so on. However, provided that 
for every u, v and prime p there is a path from u to v whose number of steps 
is not a multiple of p, the above limiting properties do hold for every random 
walk on G, and the notions of mixing and hitting times are well defined. In an 
undirected graph, for constant e, the hitting time of any walk is polynomial, 
but in a directed graph, even the standard walk may need exponential time, as 
the directed graphs in the last section show. 

The analogy here is that the stationary distribution D is like “AC current" in 
that you picture a one-step dance back and forth, but the overall state remains 
the same. This differs from the “DC” view of a traveler going on a random 
walk. What distinguishes the quantum case is that via the magic of quantum 
cancellation, we can often arrange for D(v) to be zero for many undesired loca¬ 
tions v ; and, hence, pump up the probability of the “traveler” being measured 
as being at a desired location u. 


14.3 An Encoding Nicety 

To prepare for the notion of quantum walks, we consider the probabilities p as 
being derived from a set C of random outcomes. In the background is a func¬ 
tion li(u , c) = v that specifies the destination node for each outcome c when the 
traveler is at node u. 

To encode the standard random walk in which the next node is chosen with 
equal probability among all out-neighbors v of m, we simply take |C| to be the 
least common multiple of the out-degrees of all the vertices in the graph. Then 
for each vertex, we assign outcomes in C to choices of neighbor evenly. This is 
well defined also for classes of infinite graphs of bounded degree. Indeed, the 
infinite path graph remains a featured example, taking C = {0,11 and think¬ 
ing of c as a “coin flip.” We could extend this formalism to allow arbitrary 
distributions on C, but uniform suffices for the main facts and applications. 

Now we make a matrix A whose rows index pairs u, c of nodes and random 
outcomes. We can write this pair without the comma. So we define 


A[uc,v] 


1 if h(u,c) — v 
0 otherwise. 
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Now each row has one 1 and n — 1 Os. We can, however, obtain the stochastic 
matrix A above via 

A[u,v] = —y^A'[uc, v]. 

If we had a nonuniform distribution on C, then we could use a weighted sum 
accordingly. Note also that our functional view of matrices makes this undis¬ 
turbed by the possibility that V, and hence A and A could be infinite. 

We do one more notational change that already helps with the classical case 
by making the matrix square again. We make the same random outcome part 
of the column value as well by defining: 


B[uc, vc'] 


1 if h{u , c) = t' and c' — c, 
0 otherwise. 


Then B acts like the identity on the C-coordinates and like A on the V- 
coordinates, that is, on the nodes. Now the stochastic matrix A is given by 

A[u ,' ; ] = 7^7 X! S t MC ’ vc l- 

c 

The sum on the right-hand side goes down the diagonal of the C-part, much 
like the trace operation does on an entire matrix. It is called a partial trace 
operation, and is related to the ideas in section 6.7. In the classical case, all this 
trick does is get us back to our original idea of entries A[u, v] being probabil¬ 
ities, but it will help in the quantum case where they are amplitudes. Having 
digested this, we can progress to define quantum walks with a minimum of 
fuss. 


14.4 Defining Quantum Walks 

The reason we need the added notation of C in the quantum case is that on the 
whole space V(G) <S> C, quantum evolution is an entirely deterministic process. 
What gives the effect of a randomized walk is that the action on C is unknown 
and unseen before being implicitly “traced out.” 

Definition 14.1 A quantum walk on the graph G is defined by a matrix U 
with analogous notation to B above, but where U is unitary, and allowing the 
action U c of U on the C coordinates to be different from the identity. 
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Indeed, in the case |C| —2, by making Uc have the action of a 2 x 2 
Hadamard matrix H, we can also simulate the action of flipping the coin at 
each step. Again the action of H is deterministic, but because measurement 
involves making a choice over the entries of H, the end effect is nondetermin- 
istic. Here is an example that packs a surprise. 

Let G be the path graph with seven nodes, labeled u = ~3, ~2, ~1,0,1,2,3. 
Our state space is V(G) ® {0,1}. To flip a coin, we apply the unitary matrix 
C — 1(E) H, where I is the seven-dimensional identity matrix. To effect the 
outcome b, we apply the 14 x 14 permutation matrix P that maps (n, 0) to 
(u — 1,0) and (m, 1) to (u + 1,1). Because we will apply this only three times 
to a traveler beginning at 0, it doesn’t matter in figure 14.1 where (—3,0) and 
(3,1) are mapped—to preserve the permutation property they can go to (3,0) 
and (~3,1), respectively, thus making the action on V(G) circular. Our walk 
matrix is thus A — PC. We apply A’ to the quantum basis state ao that has a 
1 in the coordinate for (0,0) and a 0 everywhere else. 

Figure 14.1 

Expanded graph G' of quantum walk on path graph G. 



In three steps of a classical random walk on G starting at the origin, the 
probabilities on the nodes (~3, _ 1,1,3), respectively, are (g, |, |, g) according 
to the familiar binomial distribution. (Those on the even nodes are zero because 
G is bipartite.) This is arrived at by summing over paths, each path being a 
product of three entries of the walk matrix. Because each nonzero entry is j, 
the middle values come about because there are three different ways to go from 
0 to +1 in three steps and likewise from 0 to ~1. 


14.5 Interference and Diffusion 

Having created the graph for the quantum walk, we need to say who our walker 
is going to be. Of course, it is the Feynman mouse Phil we encountered in 
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chapter 7. That is to say, our quantum walk involves a sum over paths, with 
each path being a product of three entries in the matrix A. Much is the same 
as with the three-stage mazes in that chapter. There is Hadamard cheese: the 
numerators of the nonzero entries can be ~1 and +1. The denominators have 
rather than 2 as with classical probability, but they will be squared when 
going from amplitudes to probabilities at the end. Third, and the unseen part 
under the hood, the paths being summed by nature fork not only in the G part 
but also in the C part of the space. That is to say, each coin outcome, which is 
represented by a column of the ordinary 2x2 Hadamard matrix. 



has two ways of reaching that outcome, via the first or second row. When the 
coin outcome is 0 for “tails,” both entries contribute a numerator of +1, but 
when the outcome is 1, one path contributes a +1 and the other ~1. 

Hence, the walk is really taking place in a 14-node graph G' that includes 
the coin flips. This graph has directed edges from (m, 0) and (m, 1) to (m — 1,0) 
for the outcome “tails,” say wrapping around to (3,0) in the case u = “3. 
For “heads,” it has edges from (m, 0) and (u, 1) to (u + 1,1), again wrapping 
around, with the crucial difference that the rightward edges from (u, 1) (repre¬ 
senting a previous outcome of heads) have multiplier ~ 1. The other edges have 
+1. Now the three-step paths from (0,0) in G', and their multiplier values, are: 


(0,0) 

—4 

(1,1) 

-4 

(2,1) 

-4 

(3,1) 

1 -1-1 

1 

(0,0) 

—4 

(1,1) 

-4 

(2,1) 

-4 

(1,0) 

1 ■ 1-1 = 

-1 

(0,0) 

—4 

(1,1) 

—4 

(0,0) 

-4 

(1,1) 

111 = 

1 

(0,0) 

-4 

(1,1) 

-4 

(0,0) 

-4 

(-1,0) 

1 


(0,0) 

-4 

(-1,0) 

-4 

(0,1) 

-4 

(1,1) 

11-1 = 

-1 

(0,0) 

-4 

(-1,0) 

-4 

(0,1) 

-4 

(-1,0) 

1 



(0,0) -4 (-1,0) -4 (-2,0) -4 (-1,1) : 1 

(0,0) -4 (-1,0) -4 (-2,0) -4 (-3,0) : 1 . 

Thus, there are six, not four, different destinations. The crux is that the two 
paths that reach destination (1,1) have multipliers of 1 and -1 and hence can¬ 
cel, while the two paths with destination (“1,0) both have multipliers of 1 and 
hence amplify. The 14-dimensional vector representing the quantum state of 
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the outcome of the walk, given the initial vector that had a 1 in place (0,0) and 
nothing else, becomes this when arranged in a 2 x 7 grid: 


i To 0 1 0 0 0 1 
71 1020-100' 


Finally, to obtain the probabilities while “tracing out” the unseen coin, we sum 
the squares in each column. The final classical probability vector, giving the 
probability of finding the “traveler” at each of the original seven nodes of G 
after a measurement, is 


15 11 

-, 0 ,-, 0 ,-, 0 , - . 
8 8 8 8 


This outcome of diffusion stands in marked contrast to the classical distribu¬ 


tional outcome. What happened, and why the loss of symmetry? 

It is at least reassuring that the bipartiteness of G showed through, giving 
zero probability again on the even-numbered vertices. But the leftward bias 
flies in the face of fairness when flipping a coin. The Hadamard matrix is used 
all the time to introduce quantum nondeterminism, so why does it give off- 
center results? The reason is that the -1 entry biases against “heads,” causing 
cancellations that do not happen for “tails.” 

These cancellations can be harnessed for a rightward bias by starting the 
traveler at (0,1) rather than (0,0). That is, unknown to the traveler or any 
parties observing just the original graph G, we are starting the coin in an initial 
state of “heads” rather than “tails.” In the G <S> C space, this means the initial 
state a i has a 1 in the coordinate for (0,1) and a 0 everywhere else. From (0,1), 
some relevant three-step paths are: 


( 0 , 1 ) 

—•> 

( 1 , 1 ) 


( 2 , 1 ) 


( 1 , 0 ) 

- 1 - 1-1 = 1 

( 0 , 1 ) 

—•> 

( 1 , 1 ) 


( 0 , 0 ) 


( 1 , 1 ) 

-111 = -1 

( 0 , 1 ) 

—> 

(- 1 , 0 ) 


( 0 , 1 ) 

-4 

( 1 , 1 ) 

1 • 1 • -1 = -1 

( 0 , 1 ) 


(- 1 , 0 ) 


( 0 , 1 ) 


(- 1 , 0 ) 

1 • 1 • 1 = 1 

( 0 , 1 ) 


( 1 , 1 ) 

-4 

( 0 , 0 ) 


(- 1 , 0 ) 

-111 = -1 


These show a mirror-image amplification and cancellation and give the 14- 
vector 

0010 - 20-1 

1 0 0 0 1 0 0 

These amplitudes give the classical probabilities [g,0, g,0, |,0, |] on G, now 
biased to the right. 
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Hence, you might think that you can cancel the biases by starting the system 
up with the coin in the “half-tails, half-heads” state 


^2 — 7= («0 ' 

V2 


-at). 


This is like saying Schrodinger’s cat has half a tail—or rather the square root 
of half a tail. By the linearity of quantum mechanics—remember A 3 is just a 
matrix—the final state you get now is 


-2 0 0 
0 0 0 ' 

Note that we got some more cancellations—that is to say, the two walks inter¬ 
fered with each other—and those were both on the right-hand side, so we 
have bias to the left again with probabilities [^,0, j,0, ^,0,0], In particular, 
the rightmost node is now unreached. 

We can finally fix the bias by making the second walk occur with a quarter- 
turn phase displacement. This means starting up in the state 


b 2 = 


-^(bo + bi) 

V2 


0 0 
2 0 


a 3 


1 

V! 


(«0 + ia i). 


This state is like Schrodinger’s cat with half a tail and an imaginary head. 
Again by linearity, the final state is b 3 = (bo + ib\)/*/l, so that 


b 2 


1 

4 


0 0 1 +z 0 - 2 i 0 

1 +z 0 2 0 -1 + i 0 


1 — i 
0 


Taking the squared norms and adding each column gives the probabilities 

— [0 + 2,0,2 + 4,0,4 + 2,0,2 + 0] = 

16 

again, at last modeling the classical random walk. 

A more robust way to fix the bias is to use a suitably “balanced" matrix other 
than Hadamard for the quantum coin-flip action. A suitable unitary matrix is 


13 3 1 
8 ’ 8 ’ 8’8 


V2|_« lj' 

If we take higher powers A k , whether A = P(l <g) H) or A = P(l <g) J) or what¬ 
ever, then the circular connectivity of G guarantees that the uniform classical 
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distribution giving probability ^ to each node of G is approached. It is stable 
as the induced classical distribution. 


14.6 The Big Factor 


Now if we extend the graph G beyond seven nodes, then we come immediately 
to the surprise of greatest import. Let G have n — 9 nodes, labeled ~4 to 4, 
and define G' as before including wrapping at the endpoints. Extend P to be 
18 x 18 accordingly, keep H as the action on the coin space, and consider 
walks of length 4. Labeling the 16 basic walks of length 4 by HHHH through 
TTTT gives us a shortcut to compute the destination (m, a ) and multiplier b e 
{1,-1} for each walk w: 

• m is the number of H in w minus the number of T. 

• a is 0 if w ends in T, and 1 if vv ends in H. 

• b is “1 if HH occurs an odd number of times as a substring of w, and 1 if 
even. 


For example, HHTT, HTHT, and THHT all come back to (0,0), and their 
respective multipliers are ~1, L and -1. Importantly, this implies there is a can¬ 
cellation at the origin, leaving -1 there. Similar happens with HTTH, THTH, 
and TTHH ending at (0,1), leaving 1, whereas HTTT, THTT, and TTHT all 
hit (~2,0) with weight 1, reinforcing each other to leave 3 there. The full 2x9 
vector for the quantum state after the walk is: 

0010 1 o-io-i 

1030 - 10100 ' 



For the initial state (0,1), we have the same rules, except with Hw in place of 
w. This yields 


b i 


1 

4 


0 0 1 0 -1 0 3 0 1 

10 10 - 10-100 


Again, (bo + ib i) results from superposing the initial states with a 90- 

degree phase shift on the latter. Taking the squared norms of its entries gives 


1 

32 


00 2 020 10 02 
20 10 020 2 00 
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Summing the columns finally yields the desired distribution D of classical 
probabilities: 


13 13 1 

D= — -,0, -,0, — . 

16 8 8 8 16 


This is the surprise: the five nonzero numerators differ from [^], 
which is the classical walk’s binomial distribution. The quantum coin-flip dis¬ 
tribution is flatter in the middle, with more weight dispersed to the edges. 

As n increases, this phenomenon becomes more pronounced: the locations 
near the origin cumulatively have low probability, whereas most of the prob¬ 
ability is on nodes at distance proportional to n. This phenomenon persists 
under various quantum coin matrices. The general reason is that the many 
paths under the classical “H-T” indexing that end close to the origin tend to 
cancel themselves out, whereas the fewer classical paths that travel do enough 
mutual reinforcement that the squared norms compensate for the overall count. 

The effect is that, unlike the classical distribution, which for n steps has 
standard deviation proportional to v /«, the quantum distributions have standard 
deviation proportional to n. Some implementations make them approach the 


uniform distribution on [-^2-, -^=], whose standard deviation is ^\n > OAn, 
which is a pretty big factor of n. Thus, the quantum traveler does a lot of boldly 


going where it hasn’t been before. 


14.7 Problems 

14.1. Verify the probabilities obtained for the four-step walk on the graph with 
nodes labeled ~4 to +4. What happens when this walk is started in the state 

(«o + «i), that is, without the phase shift on the “initial heads” part? 

14.2. Work out the amplitudes and probabilities for the three-step walk with 
the J matrix in place of the Hadamard action on the coin space. 

14.3. Can you devise a combinatorial rule for figuring the destination and 
amplitude of basic paths under the J matrix, in terms of the binary code of 
the path, analogous to the counting of HH substrings for the H matrix? 

14.4. Can you use the combinatorial rule for the H matrix to prove that the 
amplitude of (0,0) for an n-step walk (ii even) on the infinite path graph is 
bounded by -k- + e, for suitable e > 0? 

V2 n 
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14.8 Summary and Notes 

The key to understanding a classical random walk in a graph is that it is defined 
locally. That is, for every node u and every neighbor v of u, define A(u, v) to 
be the probability of choosing to walk to v. Then A 1 (it, w) is the probability of 
reaching node w in exactly two steps given that you started at u, and this carries 
over to any power: A k ( u , v) gives the probability of ending at v in exactly k 
steps, given that you started at u. Thus, random walks in graphs are just linear 
algebra. 

The nice thing about quantum walks is that they too are just linear algebra, 
except the entries are complex numbers whose squared absolute values become 
the probabilities. The use of matrix multiplication and summing over paths 
is the same, except that what gets summed are possibly complex amplitudes 
rather than probabilities. The difference—maybe not so nice—is that these 
amplitudes can cancel, thus giving zero probability for certain movements from 
u to w that would be possible in the classical case. This enables piling higher 
probabilities on other movements in ways that cannot be directly emulated 
classically. But again the key is that the definition is local for each “node” 
(which is just a basis state) and gives you a matrix. 

We have explained quantum coins with a “hidden-variables” mentality, and 
one must be careful when combining that with ideas of “local.” However, this 
view has not tried to obscure nonlocal phenomena such as the way certain 
superpositions, for instance, the two coin states without the phase shift, can 
render certain far locations unreachable. We have really tried to emphasize the 
role of linear algebra and combinatorial elements such as the enlarged graph 
G' and the counting of HH substrings. It may seem strange to picture that 
nature tracks the parity of substring counts, but this is evidently the effect of 
what nature does. There are formalisms we haven’t touched such as calculating 
in the Fourier-transformed space of the walk branches, which can avoid an 
exponential amount of work while at least giving good approximations. 

There are some undirected graphs of small girth, that is, maximum distance 
between any pair of nodes, in which a quantum walk diffuses exponentially 
faster than a classical one. One such graph glues together two full binary trees 
of depth d at the leaves, giving girth 2d and n = 3 • 2 d — 2 nodes overall. A 
classical random walk starting from one root quickly reaches the middle but 
thereafter has two ways to turn back for every one way forward toward the 
other tree’s root, giving hitting time exponential in d although still O(n). In the 
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corresponding quantum walk, the turn-back options can be made to interfere 
enough that the situation essentially becomes a walk on the straight path, giv¬ 
ing 0(d) diffusion as above. These and many other results about walks have 
recently been comprehensively surveyed by Venegas-Andraca (2012). Classi¬ 
cal random walks on undirected graphs were “de-randomized” via the famous 
deterministic logarithmic space algorithm for graph connectivity of Reingold 
(2005). 

Quantum walks have yielded improved and optimal algorithms for certain 
decision problems. Beyond that, they exemplify the idea that quantum compu¬ 
tation is about creating rich quantum states, one that cannot be readily simu¬ 
lated by classical means, by which novel solutions can be obtained. 

This chapter has followed some material in the survey paper by Kempe 
(2003), but with some more elementary examples and a bridge to the material 
of Santha (2008) and Magniez et al. (2011) incorporated in the next chapter. 
There is also a recent textbook by Portugal (2013) devoted entirely to quantum 
walks and search algorithms; it is comprehensive for this chapter and part of 
the next. 




15 Quantum Walk Search Algorithms 


The last chapter gave an elementary, self-contained treatment of quantum 
walks. The present chapter brings us to developments in quantum algorithms 
within the past 5 to 10 years. Our main purpose is to leverage the ideas of 
the last two chapters to explain a “meta-theorem” that underlies how quantum 
walks serve as an algorithmic toolkit for search problems. 


15.1 Search in Big Graphs 

We have seen that at least on the path graphs, a quantum walk does a good 
and fast job of spreading amplitude fairly evenly among the nodes, rather than 
lumping it near the origin as with a classical random walk. When the graph 
G is bushier and has shorter distances than the path graph, we can hope to 
accomplish such spreading in fewer steps. If we are looking for a node or 
nodes with special properties, then we can regard the evened-out amplitude as 
the springboard for a Grover search. If the graph’s distances relative to its size 
are small, then we can even tolerate the size becoming super-polynomial— 
provided the structure remains regular enough that the action of a coin with d 
basic outcomes can be applied efficiently at any node of degree d. 

For motivation we discuss the following problem, whose solution by Andris 
Ambainis is most credited for commanding attention to quantum walks. 

Element distinctness: Given a function/: [n] —> [«], test whether the ele¬ 
ments/^) are all distinct, i.e.,/ is 1-to-l. 

The best-known classical method is to sort the objects according to fix) and 
then traverse the sorted sequence to see whether any entry is repeated. If we 
consider evaluations fix) and comparisons to take unit time, then this takes 
time proportional to n log n. 

If we wish to apply Grover search, then we are searching for a colliding 
pair ( x,y ), y / x, such that/(x) =/(y). We can implement a Grover oracle for 
this test easily enough, but the problem is that there are (") = order-?; 2 pairs to 
consider. Thus, the square-root efficiency of Grover search will merely cancel 
the exponent, leaving O(n) time, which is no real savings considering that the 
encodings of x,y and values of/really use 0(log n) bits each. 

The idea is to make a bigger target for the Grover search. Let r > 2 and 
consider subsets R of r-many elements. Call R a “hit” if / fails to be 1-to-l 
on R. Testing this might seem to involve recursion, but we will first expend r 




144 


Chapter 15 Quantum Walk Search Algorithms 


quantum steps to prepare a superposition of states that include the values/( m,) 
for every /--tuple (u\,...,u r ) e R in a way that the hit-check for every R is 
recorded. Thus, the preparation time for the walk is reckoned as proportional 
to r, and the hit-check needs no further evaluation off. 

However, now we have order-/;' many subsets, which seems to worsen the 
issue we had with Grover search on pairs. This is where quantum walks allow 
us to exploit three compensating factors: 

1. The subsets have a greater density of “hits”: any r — 2 elements added to a 
colliding pair make a hit. Hence, the hit density is at least 

C- 2 ) _ r(r- 1) /rx2 

(".) n(n- 1) W 

In general, we write E for the reciprocal of this number. 

2. Rather than make a Grover oracle that sits over the entire search space, we 
can make a quantum coin that needs to work only over the d neighbors of a 
given node. 

3. If we make a degree-// graph that is bushy enough, then we can diffuse 
amplitude nearly uniformly over the whole graph in a relatively small num¬ 
ber of steps. 

The walk steps in the last point represent an additional time factor compared 
with a Grover search, but the other two points reduce the work in the iterations. 
This expounds the issues for search in big graphs. Now we are ready to outline 
the implementation. 

In thinking about the element-distinctness example for motivation in what 
follows, note that the vertices of the graph G are not the individual elements 
x,y but rather the (unordered) /--tuples of such elements, corresponding to sets 
R. The adjacency relation of G in this case is between R and R' that share r — 1 
elements, so that R' is obtained by swapping one element for another. This 
defines the so-called Johnson graph J n r . Although all our examples are on 
Johnson graphs, the formalism in the next section applies more generally. 

The final major point is that to encode an /--tuple as a binary string, we need 
0(rlog/z) bits, and hence at least that many qubits. Because r — r(n) will often 
be n c for some constant c, this is a higher order than a compact encoding of [//] 
as {0, l} f via £ ~ log 2 zz qubits would give us. This is why we call the graphs 
“big.” 
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The main impact is that we will not be able to hide factors of r under (9-tilde 
notation the way we can with factors of (. This will entail distinguishing the 
following three chief cost measures: 

1. Quantum serial time, which is identified with the quantum circuit size; 

2. Quantum parallel time, which is the quantum circuit depth meaning the 
maximum number of basic gates involving any one qubit; and 

3. Quantum query complexity, which is the number of evaluations of the 
oracle Uf for/. 

Usually these costs follow this order from highest to least. Provided/(M,) is 
computable (classically) in time for any element u ,• in a tuple, 

one can hope for either time to be (9-tilde of the query complexity. The main 
issue is looking up a desired or randomly selected iq, sometimes also a stored 
function value /(iq), in a tuple. If the tuple is sorted, then binary search will 
work in (9(log r) stages and ()((log r) — C° { 11 depth (the literature also speaks 
of “random-access time”). However, the need to use gates on all r elements 
can cause an extra factor of r in the circuit size or serial time. 


15.2 General Quantum Walk for Graph Search 

The first idea in formulating quantum walks generically is that the coin space 
need not be coded as {1 ,... ,d] but can use a separate copy of the node space. 
Then the expanded graph G' becomes the edge graph of G. We still reference 
nodes X,Y,Z... in our notation; the capital letters come from thinking of G as 
a big graph as above. The previous node X of a walk now at node Y is preserved 
as with the previous coin state in the current state (X, Y). Execution of a step 
to choose a next node Z is achieved by changing ( X , Y) to (Y,Z). This can be 
done by treating the first coordinate as the “coin space” and replacing X by 
a random Z to make (Z, Y), then either permuting coordinates to make (Y,Z) 
explicitly or leaving (Z, Y) as-is and being sure to treat the other coordinate 
with Y as the coin space next. 

The next two ideas come out of Grover search. As before, the goal is to 
concentrate amplitude on one or more of the “hit” nodes. This is the reverse 
of the notion of a walk starting from that node or nodes, which would diffuse 
out to uniform probability. Reversal, however, is “no problem” for quantum 
computation. Hence, we can start up in a stationary distribution of the classical 
walk on G, which by the first idea will extend naturally to the quantum walk 
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because G' uses a copy of the nodes of G. For a d-regular graph, the uniform 
distribution is stationary. 

Of course we cannot expect that a generic walk will run in reverse to every 
possible start node because that would not be invertible. The third idea is to 
combine the diffusion step with a Grover-type sign flip upon detecting that a 
node reached on the current edge is a hit. This will drive amplitude onto the hit 
nodes. Accordingly, we define a diagonal unitary operator Uf on our doubled- 
up graph space as a matrix with diagonal entries 


UflXY] = 


if X or Y is a hit, 
otherwise. 


The game now becomes: how can we minimize effort while computing this 
U f , and how can we identify it with the Grover reflection-rotation scheme? We 
can define the hit vector It and the orthogonal miss vector m much as before, 
except now we have h(XY) = 1 (suitably normalized) whenever X or Y is a 
hit, m(XY) — 1 when neither is. The stationary distribution n of the walk on 
the paired-up graph space will give us a quantum start vector n analogous to j 
in chapter 13, again belonging to a two-dimensional space So- The following, 
however, causes tension between the two objectives: 


To minimize the cost of updates for the walk, we use functionally 
superposed states sy = \XY)\g(XY)) where |g(AT)) is “data” 

to facilitate the checking and possibly the updating steps. However, 
as explored in problem 13.9 in chapter 13, the values g(XY) in the 
extended space S g upset the geometry of two reflections producing 
a rotation toward h that we would enjoy in the simple XY space So 
if g were absent (or constant). 


A trick we could apply to solve the immediate problem is the “compute- 
uncompute” trick in section 6.3, as employed by the Grover approximate 
counting algorithm in section 13.5. In this case, we would first uncompute 
\XY)\g(XY)) to |AT)|0-"0), apply the reflections in So, and then recompute 
the data from scratch to set up the next walk step. However, this defeats the 
purpose of using the neighborhood data g(XY) to make the walk more effi¬ 
cient because uncomputing and recomputing the neighborhood data items from 
scratch is more expensive than simply updating them. 
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The best solution known today works in a hybridization of S g and .S’o that 
approximates the reflection action in So- The detailed proof that the approxi¬ 
mation is close enough to make the rotation mechanism work is beyond our 
scope (see end notes for references). Fortunately, the theorem that this proves 
has a nice “toolkit” statement so that applications can spare these details and 
concentrate on the cost parameters—which can accommodate any of our three 
complexity measures. Here is our first statement of the five parameters: 

• S: the setup cost for the functional superpositions giving “data” D — g(XY). 

• U: the update cost for a walk step in the graph, which when repeated will 
approximate a reflection about n. By intent it equals the cost for a step in 
the underlying classical walk, although sometimes this involves blurring a 
distinction between 0(1) and 0(log«) or 0(log d) for the update. 

• C: the checking cost for (the reflection effecting) each Grover query Uf. 

• E: the reciprocal density of hits after the initial setup, i.e., E = N/k. 

• D: the reciprocal of the eigenvalue gap A defined below, which governs the 
spread of the walk. 

The next two sections give all the details on how the walk is implemented 
so that the algorithms governed by these cost parameters can succeed. 


15.3 Specifying the Generic Walk 

We first implement the generic walk in the form of reflections. Let p X y give 
the probability of going from X to Y in the standard classical walk on G and 
Py X the probability in the reverse walk. For a stationary distribution n of the 
forward walk on V (G), p YX obeys the equation nyp Y x — n xPx,Y- The walk is 
reversible if in addition p* x Y — px.Y- On a regular graph n is uniform distribu¬ 
tion, so a reversible walk gives py,x —P*yx~ Px,Y and hence is also symmet¬ 
ric. But we can define the following “right” and “left” reflection operators 
even for a general walk: 

V R \XY,XZ] = 2 Jp x Y p xz — e(YXZ) 

V L [XZ,YZ] = 2 Jp* zx p* YZ -e(X,Y). 

Here e(Y,Z) is a discrete version of the Dirac “delta” function giving 1 
if Y = Z and 0 otherwise. If W ^ X, then 1 /r[XY, WZ] — 0, and similarly 
V L [XZ, YW] — 0. 




148 


Chapter 15 Quantum Walk Search Algorithms 


We can define these operators more primitively by coding the walk directly. 
Let Yq be some basis state in the coin space—usually it is taken to be the node 
coded by the all-zero string, but this is not necessary. Take Pr to be any unitary 
operator such that for all X and Y, 

Pr[XY,XYq] = y/px,Y- 

ForZ / To- Pr[XY,XZ] may be arbitrary, but Pr[XY, WZ] — 0 whenever W / 
X. Among concrete possibilities for Pr, we could let To be the state coded by 

the all-1 string instead and control on it so that the action on Z ^ To is the 

identity, or we could define Pr|AK, XZ] — ^/px,Y®Y 0 ®z- It turns out not to 
matter. Define the projector and reflection about To by 

1 ifW = XAT = Z=y 0 
0 otherwise; 

2 PolXY, WZ] - e(XY, WZ) 

1 ifX = WAY = Z = Y q , 

-1 if X — W A B — D ^ Yq, 

0 otherwise. 

Our two unitary reflections hence enjoy the following equalities: 

Vr = PrJoPr 
V l = PlJoP'l 

Now the following blends a Grover search with the quantum walk: 

Definition 15.1 The generic quantum walk derived from the classical 
walk P = (pxy) on the graph G is the walk on Y-TG) ® V(G) defined by iterat¬ 
ing the step operation 

Wp=V L V R , 

and the generic search algorithm iterates alternations of U f and W P , or more 
generally, Uf followed by a routine involving one or more walk steps. 

As a footnote, there are allowable variations in U, as well as the walk steps. 
Provided G is not bipartite, it is OK to omit a check for T being a hit in the 
definition of u f . That is, if we define 

Ul[XY] — “1 if X is a hit, Ul[XY] — 1 otherwise; 

Ur[XY] = -1 if T is a hit, Ur[XY] = 1 otherwise. 


P 0 [XY,WZ] = 
J 0 \XY, WZ\ = 
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then it suffices to iterate W P U L without checking whether we found a hit in the 
Y coordinate. Perhaps more elegantly, we can iterate VlUlVrUr. We do not 
know whether the concrete effects on implementations have been all worked 
out in the recent literature, likewise with particular choices for Pr and the 
analogous Pj_, but asymptotically they give equivalent results. 


15.4 Adding the Data 

The data can be specified separately for the nodes X, Y in a graph edge, and we 
find it intuitive to separate g(XY) as Dx,Dy for indexing. To encode the data in 
extra indices, we can re-create all the above encoding of operators with XDx 
in place of X. Everything goes through as before—technically because we will 
have pxDx,YDy — 0 whenever YDy does not match the data update when going 
from node X with Dy to node Y. 

These considerations also factor into the initial state of the walk. Let Xq be 
the node coded by the all-zero index. To the same but on the right-hand side of 
indexing XY, and Do the data associated to To. One possibility for the initial 
state ciq of the walk is defined by 

ao(XY 0 ) = Jnx, 

and ao(XY) — 0 for T ^ To. This can be interpreted either as starting at the 
all-zero node with the coin in a stationary-superposed state or having the coin 
initialized to “zero” with the walk initially in the classical stationary super¬ 
posed state. On a regular A'-nodc graph G, we always have ny — 1 /N. Now if 
we throw in the data, this technically means initializing the state 

ao(XD x YoDo) = 

with a(XDYD') — 0 whenever D ^ I)y, T ^ To, or />' ^ Dq. Because getting 
uniform superpositions is relatively easy, and the adjacency relation of G is 
usually simple, the difficulty in preparing ao actually resides mainly in deter¬ 
mining the associated data. The same applies to the update cost—the time 
taken to implement the concrete Vr and V/_ via extended versions of Pr and 
Pi is mainly for getting the data associated with the node T traversed from X. 

Finally, in the concrete version of the “flip” Uf , there is the cost of check¬ 
ing the associated data to see whether the current node is a “hit.” For these 
steps and checks to be done in superposition, they must be coded for accom¬ 
plishment by linear transformations. All of this is done by jockeying indices. 
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Our point is not so much to argue that our notation is more intuitive or less 
cumbersome than standard notation—such as \XYq) (XY{\\ for the projector 

ontoC y(G) ® |7 0 >, or |Z) | Dx) to carry along the associated data—but rather to 
explore lower level details and suggest possible alternate encodings. 


15.5 Toolkit Theorem for Quantum Walk Search 

The last factor for efficient quantum walks is that the underlying graph G be 
sufficiently “bushy” relative to its size N. The formal notion is that G be an 
expander, meaning that there is an appreciably large value h(G) such that for 
all sets T of at most N/2 nodes, there are at least /z(G)|7j-many edges going to 
nodes outside T. This implies that there are at least h(G)\T\/d different nodes 
outside T that can be reached in one step, but it is separately significant to have 
many edges that can diffuse amplitude into these neighboring nodes. An impor¬ 
tant lower bound on h(G) is provided by the difference between the largest 
eigenvalue and the second largest absolute value of the adjacency matrix of G, 
which is called the eigenvalue gap and denoted by y (G). 1 The bound is 

\j{G)<h{G). 

For a d -regular graph, we finally define 


y(GY 

This is also the reciprocal of the eigenvalue gap S of the stochastic matrix 
of the underlying standard classical walk on G. Note that as with E = N/k, 
we are suppressing the dependence on the original problem-size parameters 
n and d with this notation. The point of doing this, and likewise with the 
setup cost S = S(n), the update cost U = U(/z), and the solution-checking cost 
C = C(n), is that the total cost of many quantum walk search algorithms can 
be expressed entirely in these five terms. The following “toolkit” main theorem 
applies to any cost measure that satisfies some minimal assumptions, such as 
being additive when routines are sequenced, and in particular applies to mea¬ 
sures of time, quantum circuit size, quantum circuit “depth” (i.e., the maximum 
number of gates on any qubit, which serves as an idea of parallel time), and 
the count of superposed queries to the search function/. 


1 More common is A (G), but our sources use A (P) for the phase gap of the quantum walk as 
discussed below. 
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THEOREM 15.2 For any search problem on a uniform family of undirected 
graphs G„ whose adjacency matrices yield walks with diffusion parameters D, 
one can design a quantum walk with setup, update, and checking phases, such 
that if their respective costs are bounded by S, U,C, respectively, and if the 
initial reciprocal hit density is E, then the overall cost to achieve correctness 
probability at least 3/4 is bounded above by a constant times 

S + VE(UVD + C). (15.1) 

We sketch the proof as an algorithm, mindful of detail dropped in one step, 
and of D and the other parameters really being functions of n. 

15.5.1 The Generic Algorithm 

As in chapter 13, we first state the algorithm in the case that the number k of 
hits—and hence E—is known. The revision for the unknown-/: case is essen¬ 
tially the same as in section 13.4, so we omit it. The constants c and d will 
come from details of the approximation step that we skip, but it is enough to 
know that they are reasonable constants. 

1. The start vector ao is the superposition n derived from the stationary distri¬ 
bution of the underlying classical random walk, augmented with data into 
a functional superposition. That is, 

«o = y, ^/^x,y\X)\Dx)\Y)\Dy) 

X,Y 

This is further tensored with c'v^E log(D) ancilla qubits set to 0, which 
are used to run the approximation routine and offset deviations from “true” 
reflections that are caused by the data qubits. 

2. Repeat c\/E times: 

2.1 Apply the Grover reflection about the “miss” vector in the form of a 
sign flip, which for reasons discussed in section 6.5 is not affected by 
the extra “data” and ancilla coordinates. 

2.2 Apply a recursive procedure, whose details from Magniez et al. (2011) 
we elide, whose major part and cost involves dV D steps of the quantum 
walk, which together approximate a reflection about n. 

3. Measure the final state «, giving a string x e {0,1}". 

4. If v is a solution, then stop. Otherwise repeat the entire process. 
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15.5.2 The Generic Analysis 

Proof sketch for theorem 15.2. By the definition of the setup and checking 
costs, step 1 costs S and each iteration of step 2.1 costs C. The latter holds 
by definition even when step 2.1 is a complicated routine that involves branch¬ 
ing or recursion, as in examples to come. 

Although we have elided details, the point in step 2.2 is that the efficacy 
of a quantum walk P is that its phase gap A (P) is proportional (at least) to 
the square root of the eigenvalue gap S of the underlying classical walk. This 
channels the observation about the quadratic advantage in spreading made at 
the end of chapter 14. In our form using the reciprocal notation D, this comes 
out as cost bounded by c'V D. Originally the “d" involved a factor of log(E), 
but recursive use of phase estimation by Magniez et al. (2011) avoids it. Hence, 
the total time for the algorithm is of order 

S + x/E(Ux/D + C). 

Modulo the (substantial) omitted details, this proves theorem 15.2. □ 


The way to apply this theorem is to find appropriate graphs on which to 
model the search problem at different problem sizes n, get the D(n) and E(«) 
values from the graphs, and design additional features associated to the nodes 
(if needed) to balance out S(n), U(n), and C(n). Before showing how this 
theorem plays out in some new examples, we pause to review Grover search in 
this formalism. 


15.6 Grover Search as Generic Walk 

Here G,v is the complete graph on N vertices, N = 2". The graph is implicit, 
with vertices x e {0,1}" being the only representation. Because Gy has degree 
N — 1 and the next highest eigenvalue is known to be 1, the gap is A — 2. 
Hence, D(n) = (N — 1)/(N — 2) ss 1. Suppose at least k of the N nodes are 
hits. Then E(n) < N/k. 

The setup takes one stage of n parallel Hadamard gates, which can be reck¬ 
oned as n if counting basic gates as a measure of sequential time, G(l) if 
counting circuit depth as a notion of parallel time, or 0 if counting queries 
to/, i.e., applications of Up. The checking cost is one such query but may be 
best regarded as n° ( 11 for both parallel and sequential time. The update cost 
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can be reckoned as 0 if counting queries, 1 if counting reflections, or 0(n) if 
counting the number of basic gates for the reflection about j. Using sequential 
time as the cost measure, we have: 


• S = 0(«); 

• U = 0(«); 
. C = n°«; 

• 

• D = 0(1); 


T 


S + VE(UVd + C) = 000 + 



(000 + 



This accords with theorem 13.1. In particular for k — 1, the time is 0(2+ 2 ). 
Note that the quantum walk architecture for controlling the search preserves 
the guarantee of faster time if the number k of hits is large. If we are counting 
queries, we can simplify by regarding fix) as data associated to the node x, 
giving U = 1, C = 0, and T — O(JWJk) queries. 


15.7 Element Distinctness 

Recall the problem is: given a function/: [«] —» [n], test whether the elements 
f(x) are all distinct, i.e.,/ is 1-to-l. Also recall from section 15.1 our intent to 
amplify by taking /'-tuples of elements as nodes of a graph, so going beyond 
£ ~ log 2 n qubits. 

Several kinds of graphs work, but the Johnson graphs J n r are the original 
and most popular choice. Recall that J n r has a node for each R c [n] of size r, 
and edges connecting R, R' when they have r — 1 elements in common; note 
that the complete graph equals J n \. The degree of J n r is r(n — r) because 
every edge involves deleting one element u from R and swapping in one ele¬ 
ment v ; not in R. The second eigenvalue is r(n — r) — n, so the gap is n pro¬ 
vided r > 2. Thus, D(n) — r ( n ~ r ) — — r 2 /n < r. From the above hit density 
we have E (n) = ( n/r ) 2 . The goal is to choose r as a function of n to balance 
and minimize the total cost. 

The quantum algorithm needs to initialize not only a uniform superposi¬ 
tion over nodes R— ( u\,...,u r ) but also the values f(u\),...,f(u r ) for its 
elements—and to sort the latter locally. This requires r linearly superposed 
queries to/ and 0(r) time overall assuming/(M,) is in time |m,| 0<1) . The update 
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needs to remove the value /(«,) and add a value f(v) when «,• is swapped out 
for t' in the walk. Here is mainly where the cost measures diverge. The query 
cost is just 2, and a binary search to maintain sortedness of the tuple and do 
the update can work in O(logr) stages and ()((log r) random-access/parallel 
time overall. The need for a circuit, however, to touch all r elements in a tuple 
takes serial time out of our savings picture. Thus, we keep tabs on the query 
and parallel time complexities, and we have: 

• S = 0(r); 

• U = C°( x \ with two queries; 

• C = f with zero queries because results are kept in the data; 

• E=(") 2 '. 

• D < r; 

T = S + VE(UVD + C) = O(r) 4— 0(*/r) — 0(r H ——). 

r y/r 

This is balanced with r — « 2 / 3 , giving parallel time 0(« 2 / 3 ) and query com¬ 
plexity a clean 0{n 2 ^). It is known that Q(h 2 / 3 ) queries are necessary, so this 
bound is asymptotically tight (“up to tilde” on parallel time). 


15.8 Subgraph Triangle Incidence 

Given an m-node subgraph H of an /i-node graph G, does H have an edge of a 
triangle in G? Given a fixed edge ( u , v) in //, we could do a ~JTi -time Grover 
search for w such that (m, w) and (v, w) are also edges. But iterating this through 
possibly order-w 2 edges in H is clearly prohibitive. We will apply the savings 
from our major example of element distinctness and, hence, focus on query 
complexity and parallel time. 

For each n\ we instead do a quantum walk to find (n, v). We take r = nr 2 
and define a subset R of the nodes of H to be a hit if it contains a suitable edge 
(n, v ; ). Using a walk on the Johnson graph J m r plus data consisting of whether 
(u, v ; ), (m, w), (v, w) are all edges, all the parameters are the same as for element 
distinctness, so the parallel time is 0(nr 2 ). 

This in turn becomes the checking time for the Grover search. Using the 
formula again, the overall parallel time is G(«U 2 m 2 / 3 ). We can amplify both 
the check and the Grover success probability to be at least 7/8, so as to yield 
3/4 on the whole. 
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15.9 Finding a Triangle 

Now we call an m-node subset R of V ( G) a hit if the induced subgraph H 
includes an edge of a triangle in G. The E and D will hence be the same as 
for element distinctness with “m” replacing “r.” We may use the previous item 
to obtain checking cost C (n) — 0(n l ^ 2 m 2 ^). The one thing that is different is 
that the setup and update costs are higher. The setup needs to encode the entire 
adjacency matrix of H , and when the update swaps in a vertex V and swaps 
out a v, it needs to update the m — 1 adjacencies of v' while erasing those of v. 
Thus, we have setup cost 0(m 2 ) and update cost 0(m), giving: 

• S = d(m 2 ); 

• U = 0(m)\ 

u C — 0{n^ 2 m 2 / 2 )\ 

• E = (-) 2 ; 

• D < m; 

T — S + \/E(U\/D + C) = 0(m 2 ) H-(0(mVm + n l ^ 2 m 2/2 )). 

m 

This time, the setup cost drops out of the equation—the balancing is between 
the update and checking cost and is achieved when m 3 /“ = n l ^ 2 m 2 ^, that is, 
when w 5 / 6 = n l/2 , so m — « 3 / 5 . This results in the overall parallel time 

O(„6/5 + n 2/5 n 9/10 ) = d(nl 3/10 ) 


The query complexity is just 0(« 13/,1 °). References cited in the end notes have 
since improved the query complexity to 0(n 9 ^) = 0(n 12 ° 51 ") by other meth¬ 
ods, although the effects on various reckonings of time are less clear. 

The best known classical algorithm for finding a triangle takes the adjacency 
matrix A , squares it, and hunts for ij such that A 2 [i,j\ ■ A\j, /] > 0. It hence 
runs in time Oiri 10 ), where the exponent co of matrix multiplication is known to 
be at most 2.372. Is there any square-root relation whereby co/2 might become 
the best approachable exponent for the query complexity or (parallel) quantum 
time? Currently that would be a target of 0(n 1185 ). No quantum lower bound 
higher than linear is known for triangle detection, whereas no lower bound 
higher than two is known for m. See the end notes for further references. 
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15.10 Evaluating Formulas and Playing Chess 

Of course we must end by playing chess. Indeed, the original Grover search 
problem could be stylized as that of finding a winning move in a chess position, 
when the fact that it is winning can be verified instantly once you see it, but 
you are a poor enough player that even a checkmate is hard to find. If there are 
L possible moves—and if we assume unit time per move even on a board with 
n squares where L depends on n —then a Grover search says we can find the 
move in 0(VL) expected time. 

When the win is not instantly verifiable, however, we are in a much harder 
situation than a Grover search. Suppose we know in advance a bound m on the 
number of turns the game will last, counting moves by both players. Keeping 
L as the bound on the number of legal moves in any position, and waving away 
the event that different sequences of moves can lead to the same position, we 
have an L- way branching game tree of size about N = L m . The classical time 
bound for exhaustive search to determine whether there is a winning strategy, 
one that is able to answer all possible moves by the opponent, is T — 0(L' n ). 
Can a quantum algorithm improve this analogously to 0(s/T) — 0(L '”/ 2 )? 

We would like to use (15.1) to set up the following simple-minded recursion 
for the time T = T(n,m ) to traverse the game tree and find a winning move if 
one exists, reporting “no” if not. We may regard n as the maximum of the total 
board size and log N. We pose the question, when is the following valid? 

• S = 0(n), immaterial as with a basic Grover search; 

• U = 0(1), because we merely play a move; 

• C = 1 + T(n,m — 1) by the desired recursion; 

• E = L as for a Grover search in a space of size L; 

• D = 0(1); 

T(n,m) = S + \/E(U\/D + C) = 0(n + Vl(1 + T(n,m — 1))) 
ss 0(VL T(n,m-l). 

If we ignore log factors or just count queries to some table of checkmates, 
then the solution would indeed become 0(L m ' 2 ) — 0(\/N). The validity of 
this, however, raises issues that lead to recent research on quantum walks. 

The first issue is amplification of the success probability, which this chap¬ 
ter has kept in the background. Even thinking in terms of a simple classical 
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recursion, if the recursive call and the current-level Grover search are both 
tuned for success probability 3/4, then the level falls to only a 9/16 guarantee, 
which fails to maintain the recursion invariant. In Grover’s quantum search, 
however, a constant error rate on the miss vector can have a huge effect. With¬ 
out further tricks, it takes amplification of order 1 — in the base step to 
contain this problem, and the solution originally obtained by Buhrman et al. 
(1998) gave time of order L m ' 2 n m ~ l . Note that because the exponent m is not 
fixed, the n m is not simply an (9-tilde factor, and in chess (under an analogue of 
the “fifty-move rule”), m can be proportional to the total board size. Speedup 
needs n m L!" /2 , and this is given only when L > n 2 . These results also apply 
to evaluating Boolean formulas that alternate L -way OR and AND levels, as in 
Buhrman et al. (1998). 

Second and more important, when L is small or even constant, it means little 
to say that each level’s Grover-style walk is giving time VL. Trees of NAND 
gates cannot be grouped as trees of OR and AND and XOR can, so they are 
stuck with L — 2. For a long time, no quantum algorithm for evaluating a full 
depth-<i tree of binary NAND gates was known to beat the 2°- 753 '” time achiev¬ 
able classically. However, first in an unconventional quantum model and then 
in our standard one, the (9(2”'/ 2 ) target was approached and then achieved 
exactly in the count of queries (Ambainis et al., 2010, Reichardt, 2011b) 
together with getting the overall time down to 0(\/N) (Reichardt, 2011a). 
These improvements have come from further clever use of quantum phase 
estimation and deeper relationships to linear algebra (Reichardt and Spalek, 
2008). Thus, for evaluating logic formulas, playing games of strategy, and var¬ 
ious related problems, quantum algorithms can deliver the same near-quadratic 
speedup that we first saw for Grover’s search algorithm. 


15.11 Problems 

15.1. Show by direct calculation that the matrix Vr is unitary. 

15.2. Work out the analogous decomposition of the matrix Vf_ using an arbi¬ 
trary basis state Xq of the “left” node space. What if Xq is not a basis state? 

15.3. In the algorithm for element distinctness, suppose we use recursion to 
check whether a set of r nodes has two nondistinct function values and thus 
constitutes a “hit.” Do you get the same 0(n 2/3 ) running time or something 
less? 
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15.4. Sketch how to implement the element-distinctness algorithm on the Ham¬ 
ming graphs H n r , whose vertices are r-tuples from [n] and whose edges con¬ 
nect r-tuples that differ in one place. Are all of S(n), U(n), C(n), E(n), and 
D (n) asymptotically the same as before? 

15.5. Sketch a quantum search algorithm that given three n x n matrices 
A , B, C finds i,j such that '^ jk A[i,k]B[k,j] ^ C[i,j] if such a pair exists or 
else outputs “accept.” You may consider arithmetic operations to be unit time 
and need only count steps that query matrix entries. The goal is cost <9 (h 5 / 3 ). 

15.6. Suppose you can make superposed black-box queries to a binary oper¬ 
ation o on [n] whose values lie in [k]. Sketch a quantum algorithm to find 
a,b,c e [n] such that (a o b) o c is not equal toao(fcoc) if any such “nonas- 
sociative triple” exists. If k — 0(1), then what is the running time of the algo¬ 
rithm? Note that the data associated to a Johnson-graph node (mi, ..., it r ) can 
preserve values of o on arguments in \k] as well as the Uj. 


15.12 Summary and Notes 

This chapter has followed the lead of Magniez et al. (2007) and Magniez et al. 
(2011), building on Szegedy (2004). We have chosen selectively from the sur¬ 
vey by Santha (2008). We could have used the quantum amplitude amplifica¬ 
tion framework of Brassard et al. (1998) and Brassard et al. (2000) as a bridge 
from section 13.5 to section 15.6, where it would give cost 0(VE(S + C)), but 
we chose to keep both sections simpler and self-contained. 

The last two problems also come from Santha (2008), while the problem 
about element distinctness on the Hamming graphs is based on Childs and 
Kothari (2012) and Childs (2013). The seminal paper on element distinctness 
appeared in full form as Ambainis (2007). The query complexity of triangle 
finding has since been improved from the Of/; 1 ’ 110 ) in section 15.9 to 0(n i>n ) 
by Lee et al. (2013), while connections to the exponent of matrix multiplication 
are explored further by Williams and Williams (2010). The advances for (play¬ 
ing chess and) evaluating NAND trees and other logical formulas include the 
work of Farhi et al. (2008), Childs et al. (2009), Reichardt and Spalek (2008), 
and Ambainis et al. (2010). See also the often-updated notes of Childs (2013). 

Links to further lecture notes and surveys and texts, including Kitaev et al. 
(2002), Kaye et al. (2007), Childs and van Dam (2008), and Mosca (2009), may 
be found at the “Quantum Algorithms Zoo,” which is maintained by Stephen 
Jordan (http://math.nist.gov/quantum/zoo/). 




Quantum Computation and BQP 


We have presented Shor’s algorithm without giving a full statement of the theo¬ 
rem it proves. The theorem reads, “Factoring is in BQP.” In words, this means 
that factoring has a feasible algorithm with bounded error. The letters BQP 
stand for Bounded-Error Quantum Polynomial Time. This is the central com¬ 
plexity class in quantum complexity theory. This chapter defines it formally, 
shows that several other possible definitions are equivalent, and shows its rela¬ 
tionship to longer-studied classes in “classical” complexity theory. 


16.1 The Class BQP 

We have already discussed the error probability of a quantum algorithm and 
how one can amplify the success probability. Saying “bounded error” entails 
formalizing conditions under which one can amplify the success probability of 
an algorithm. We define BQP for functions as well as decision problems. The 
characteristic function /L of a language L is defined by yjfx) =1 if x e L 
and %l(x) — 0 otherwise. 

Definition 16.1 A function/: {0,1}* — > {0,1}* belongs to BQP if there 
are a polynomial p, a function g computable in classical pin) time, and a 
quantum algorithm A such that for all n and inputs x e {0,1}", and for some 
r < pin), A applied to the initial state e x {y yields within pin) basic quantum 
operations a quantum state b such that 

3 

Pr[measuring b yields z such that g(z) =f(x)] > (16.1) 

A language belongs to BQP if its characteristic function does. 

The 3/4 is arbitrary; it can be replaced by 1/2 + e for any fixed e > 0, and 
in some contexts where the relation / Cv) = y is already known to be in BQP, it 
can be lower. We have differed from standard sources in making the classical 
post-processing explicit. This accords better with our presentation of Shor’s 
algorithm and makes the amplification of the success probability transparent. 

To move from the general notion of a quantum algorithm to specific models 
of quantum circuits , however, we have to address the gates, control the success 
probability from measurements, and wean off the classical parts. The following 
theorem makes a statement doing so and, hence, removes our need to specify 
the underlying quantum machine or circuit model any further. A collection 
[C„] of circuits is uniform if the mapping from n to a description of C„ is 
computable in classical polynomial time. 
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THEOREM 16.2 For any set S of quantum gates that includes the Hadamard 
gate and adds either (i) the Toffoli gate, (ii) the controlled-phase gate CS, 
or (iii) CNOT and the T-gate, every function / in BQP is computable by 
uniform circuits [C„] of polynomially many gates in S, with replaced by 
1 — e(n) provided c(n) > exp (—rr) for some k and all n, and with a single 
measurement without post-processing in which the value appears in the first 
|/(x)|-many qubit lines. 

We can sketch much of the proof, although full details inevitably depend 
on whatever model one chose to specify quantum algorithms A to begin with. 
Various models were used until quantum circuits gained ascendancy. We could 
have obviated the missing details by using circuits of (i), (ii), and/or (iii) gates 
(and nothing else) as our model for quantum algorithms to begin with, but 
doing so would have cramped both history and this text’s style. 

Proof. Regarding the equivalence of the three gate sets, the details have been 
worked out in the exercises of chapters 6 and 7. That each of them is universal, 
meaning capable of close enough approximation of quantum circuits C using 
any other finite set of basic gates to preserve (16.1), also follows from ideas 
in these exercises as summarized in the end notes to chapter 6. Moreover, the 
Solovay-Kitaev theorem gives an algorithm to produce the new circuit effi¬ 
ciently while multiplying the size s of C by less than a constant times (logs) 4 . 
This algorithm applied for the gate sets (ii) or (iii) yields full approximate sim¬ 
ulations of operators on complex Hilbert spaces, while for (i) using real spaces 
it just preserves the measurements needed for (16.1). 

The exercises have also worked out how to decompose the quantum Fourier 
transform as a composition of 0(n 2 )-many one- and two-qubit gates. The rep¬ 
resentation obtained in problem 6.13 does not use a finite set of gates because 
the twists T a are used with the angles a = n/2"~ l being exponentially fine. 
However, the Solovay-Kitaev process also applies to these gates, and the same 
efficiency giving the (logs) 0(1 ) overhead enables it to achieve approximation 
as an operator on C N using only (log/V)° (1) = gates from sets (ii) or 
(iii). This gives more than required because BQP need only satisfy (16.1) in 
the measurements, so the issues with complex angles get flattened out when 
everything is done in the real Hilbert space R 2N . 

k 

It remains to discuss the amplification to error at most 1 /2" . This comes 
from the ability to clone the basis state given as input into order -rr many 
copies, run the quantum gates in parallel on the copies, measure to get a vec¬ 
tor of outputs Zi, and take the majority vote of the final outputs g(z,) to yield 
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fix). The last and trickiest fact is that this probability can be amplified without 
relying on classical majority vote in a post-processing step. Instead, the idea of 
deferred measurement is employed to do repeated trials and accumulate results 
within the circuit. Because the majority vote is in classical polynomial time, 
theorem 5.4 is implicitly used to bring the post-processing with majority vote 
within the quantum circuit as well. Hence, a single measurement ultimately 
suffices to yield/(x) with the amplified success probability. □ 

In the case of languages L. the conditions for L e BQP, together with the 
achievable amplification, look like this: For any input re {0,1}" together 
with r — n otl 1 ancilla qubits, write a x as short for e x or and write U for the 
2" +r x 2" +r dimensional unitary transformation the circuit C computes. By 
the compute-uncompute trick in section 6.3, we can arrange for acceptance to 
yield a x again as output. Then C being a BQP-circuit for L (with amplification) 
is equivalent to the conditions that for all x e {0,1}": 

x e L => | a x ■ Ua x \ 2 > 1 — e(n), 

x i L ==> | a x ■ Ua x \ 2 < e(n). 

This form facilitates comparing BQP with other complexity classes, which 
are most commonly defined in terms of languages. We can always associate 
a language Lf to a function / so that f(x) can be computed efficiently via a 
subroutine for whether strings combining x with incrementally built binary 
strings w belong to Lf. We did this with the factoring problem in the first part 
of chapter 4. Thus, functions and languages are usually considered notionally 
equivalent in complexity theory. 


16.2 Equations, Solutions, and Complexity 

Let us consider equations involving polynomials p(y\,... ,y n ) and solutions 
where every variable y, is 0 or 1. Given such a p, here are several questions we 
can ask about it. 

(a) Is p(0 .0) = 0? 

(b) Does there exist a solution a e {0,1}" such that p(a) — 0? 

(c) Are all assignments solutions to the equation p(y) — 0? 

(d) Are over half of the assignments a solutions? 

(e) How many solutions are there? 
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We can also pose these questions within certain contexts, such as when the 
following promise condition is known to hold in advance: 

(f) Either at least 75% of the a are solutions or at most 25% of them are. 

In standard presentations of computational complexity, the following is 
a theorem based on some model-specific definition of the classes. We will 
instead adopt it as a definition giving a shortcut to formulations that are most 
useful for framing quantum algorithmic power. 

Definition 16.3 A language L or function/ belongs to the stated complex¬ 
ity class if there is a classically feasible function g such that for all n and 
x € {0,1}", g(x) produces a polynomial p{y\,... ,y m ) such that: 

(a) P: x e L <=> the answer to (a) is yes. 

(b) NP: x e L <=> the answer to (b) is yes. 

(c) CO-NP: x e L <=> the answer to (c) is yes. 

(d) PP: x e L <=> the answer to fd) is yes. 

(e) #P: f(x) = the number of solutions. 

(f) BPP: xeL <=> the answer to fd) is yes, where (f) holds for all x. 

The freedom to choose the reduction function g allows some manipu¬ 
lation of equations, such as adding dummy variables or making terms that 
force certain arguments to certain values in order for a solution to be possi¬ 
ble. Doing so shows relations between the questions and hence the classes. 
For instance, question (a) can be transformed to a case of (c) upon replacing p 
by p’ — p • (1 — yi)(l — yi) ■ • ■ (1 — ym)■ Then p( 0,.... 0) = 0 if and only if all 
binary assignments a make p'(a) = 0. A similar idea transforms (a) into (b), so 
we conclude: 

P C NPn CO-NP. 

ThatNP C PP is a bit trickier but uses dummy variables z to make any solution 
y for p(y) — 0 the feather that tips the scales making over half the assignments 
to p'(y,Z ) be solutions. Because flipping p(y) to be 1 — p(y ) flips the answers to 
questions (a) and (d), the classes P and PP are closed under complementation 
of their member languages, and it follows also that CO-NP C PP. Given any 
language L e NP, these tricks create a reduction function g' such that for all x, 
x e L <=> g'(x) = p 1 belongs to the language L,i of polynomials p' for which 
over half the assignments a make p’ia) — 0. This is summarized by saying that 
Ld is NP-hard. We have defined things so that the language Lb of p for which 
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there exists a solution is immediately NP-hard, and because Lb also belongs to 

NP, it is NP-complete. 1 

That BPP C PP is immediate by relaxing the promise condition (f), and 
BPP is likewise closed under complement. Because (f) is maintained in the 
trivial reduction to (d) from (a), we have P C BPP, but the methods used in 
going from (b) or (c) to fd) do not preserve it, and neither NP nor CO-NP is 
known to be contained in BPP. 

Questions about whether there are at least k solutions can be tweaked into 
the form fd) via dummy variables. Binary search on whether there are at least 
k solutions then enables feasibly counting their number, so question (e) (which 
clearly subsumes the other questions) is roughly equivalent to fd). Technically, 
one cannot compare PP to #P directly, because the latter is a function class, 
and getting multiple values of the form (e) might be more powerful than a 
single question fd). However, asking for the difference between the numbers 
of solutions and nonsolutions stays within the power of fd), a fact we use to 
conclude BQP C PP. For now we note: 

Theorem 16.4 BPP c BQP. □ 

The polynomial equations that arise in this development have other uses for 
analyzing quantum circuits, and we explore them next. 


16.3 A Circuit Labeling Algorithm 

We give an algorithm for labeling a quantum circuit algebraically. The end 
result is a polynomial equation for which the difference of two numbers of 
solutions yields the circuit’s acceptance probability. We first give a form in 
which the polynomials have values +1 and ~1, which correspond to the signs in 
the sum-over-paths explication we gave for quantum measurement and effects 
in chapter 7. This uses the value 0 to eliminate impossible paths. Then we show 
in section 16.5 how to make do without the value ~1 and reduce the degrees of 
the equations drastically. 


1 With respect to a machine-model definition of NP this is a theorem, indeed an offshoot 
of the fundamental Cook-Levin theorem as explored in this chapter’s exercises. We remark 
also that in all these cases, we can obtain a feasible function g" that first builds a polynomial 
p(x j ,...,x n ,y i,..., y,„) that depends only on n, and then obtains p x = p(y \...., y m ) by substitut¬ 
ing the actual value of each bit x ; of the given x for the corresponding formal variable x,. 




164 


Chapter 16 Quantum Computation and BQP 


The algorithm goes in stages for each new gate working left to right, that is, 
from what we regard as inputs to outputs. It assumes the wires into a gate have 
already been labeled, and it labels the gate’s outgoing wires, with unaffected 
qubit wires keeping their labels. By theorem 16.2 we can restrict attention to 
Hadamard and Toffoli gates, although we include CNOT because it is useful 
to show, and the exercises treat how to extend this for other quantum gates. We 
let uj, uj , Uk stand for the current labels on the qubits i, j, and/or k involved in a 
gate. 

1. Label the inputs with variables x\,.. ,,x n . If there are ancilla qubits, then 
continue labeling themx„+i,... ,x m , although if they will always be initial¬ 
ized to 0, then one can label them 0 straightaway. 

2. Label the outputs with variables zi,-..,Z n , again using more if there are 
more qubits. From here on we will not need to address ancilla qubits as 
special and will just say “n” for the end index. 

3. Let h be the number of Hadamard gates in the circuit, and allocate variables 

yi,...,yh- 

4. Initialize a polynomial P , called the global phase polynomial, to the con¬ 
stant 1. 

5. For the next Hadamard gate Hj on some qubit line i, allocate the fresh 
variable y ; , multiply P by the factor (1 — 2 u/yj), and make y- ; the new label 
on line i. 

6 . For a CNOT gate, leave the control label m , unchanged, but change uj to 
Uj + Uj — 2uiUj. There is no change to P. 

7. For a Toffoli gate with controls on lines i,j, leave Uj and Uj alone, but change 
the target Uk to Uk + u,Uj — luiujuk. There is no change to P. 

8. When done with all the gates, for each i, create the measurement constraint 
e(uj,Zi), where m , is the last label on line i and 

e(u,z) = 2 uz + 1 — u — z. 

Note that e(0,0) = e(l, 1) = 1, whereas e(l,0) = e(0,1) = 0, so these 
enforce equality of the final labels and the outputs on Boolean outcomes. 
The final polynomial R — Rc is the product of P and all the measurement 
constraints. 


We may instead consider the measurement constraints to be defined at the 
beginning as e{xt,zi) for all i, and whenever a label iij is changed to v;, the 
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constraint e(u/,Zi) is changed to e(v,-,z,)- Here are two examples of the label¬ 
ing algorithm. The first puts something in place of the “?” label in the circuit 
example from section 7.6, which showed entanglement: 


xi 


H 


z t 


X2 


y + x 2 - 2 yx 2 

■0- 1 - 22 


(16.2) 


P = 1 — 2xi>’ 

R = P-e(y,zi)e(y + X2~2yx2,Z2) 

= - 2x\y) ■ (2yzi + \ - y - zi) 

•(2(y + X 2 - 2 yx 2 )Z 2 + 1 - y + *2 - 2 y *2 - Z 2 )- 

When we substitute the input 00, that is, a\ — 0 for x\ and a 2 = 0 for xj, 
this simplifies to P — 1 and R — e(y,zi)e(y,Z 2 )- We can infer from this that the 
only allowed output states have z\ — Z 2 , which shows the entanglement. 

The following larger example is adapted from two sources that treated the 
output constraints as separate equations: 



z 1 

Z2 

23 


P = (1 - 2xiyi)(l - 2 x 2 >'2)(1 - 2yiy 3 )(l - 2y 4 (y l y 2 +X3 - 2yiy 2 X3)) 
R = P ■ e{y 2 y 4 +y3-2y2y3y4,z\)e(y2,Z2)e(y 4 ,Z3)- 


16.4 Sum-Over-Paths and Polynomial Roots 

Now let u 1 u 2 u s be the matrix representation of a quantum circuit, that is, 
as a product of N x N matrices, and let a,b e [/V]. Consider a path 

U 1 [a,c l ]U 2 [c u c 2 \ ■ ■ ■ U s i[c s - 2 , c s — 1 ] U s [c s -],b]. 

Call the path positive if it multiplies out to +1, negative if it multiplies out to 
“1, and zero if one of the entries is 0, which is the only other possibility for 
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circuits of Hadamard, CNOT , and Toffoli gates. Let p + (a,b) be the number 
of positive paths from a to b and p~(a.b) the number of negative paths. We 
will again use the “Phil the mouse” visualization of these paths—recall that 
the “maze gadgets” for these gates were also depicted in chapter 7. In this 
section we will conserve these quantities individually, not just their difference 
p + (a, b) — p~(a,b), which gives the amplitude of “surviving Phils.” 

Finally, referencing the polynomial R , define Nr [ +1 | a; b] to be the number 
of assignments y to y\,... ,yh that make R(a; y; b ) = 1. Define Nr[~ 1 \ a; b] 
similarly for R(a; y; b) = ~ 1. Here R(a; y; b) means we are substituting a for 
x\,... ,x„ and b for zi, ■ ■ -,z n - 2 

The following technical lemma connects the sum-over-paths formulation 
with the numbers of solutions to the equations R = 1 and R = ~ 1: 

Lemma 16.5 For all circuits C on n qubits as above, and a, b e {0,1}", 

p+(a,b) = N R [+l\a;b], 

(16.3) 

p (a.b) = | a; £>]. 

Proof. We work inductively as C is built gate-by-gate from an initially empty 
circuit Co- Co has one positive path from a to a for each a , none from a to b 
when b f- a, and no negative paths anywhere. The initial polynomial Rq is 

n 

n dxi.zi) 

i= 1 

because there are no y variables. Whenever a = b. Rtf a, h) = 1, whereas a f 
b means a, f- h, for some i, whereupon eicij, bj) zeros out the product. We 
technically satisfy (Vr[+1 \ a;a] — 1 for each a because there is exactly one y e 
{0,1}°, namely, the empty string, whereas (Vfi[ - 1 | a; a\ = 0 because nothing 
gives a value of _ 1. Thus, the lemma’s properties hold for Co- 

For the induction, let C, P, R satisfy the equalities in the lemma, and suppose 
we obtain a new circuit C' first by adding one Hadamard gate on qubit line i. 
Let u denote the label before the gate on that line. Let us fix an input a and 
output b except for the value /?, on that line. That is, we consider the two 
outputs b[bj — 0] and b[bi= 1], Let Pq,Pq stand for the numbers of paths of C 
from a to b[bj — 0] that multiply out to +1 and ~1, respectively. Write pf ,pf 

2 The exercises explore ideas such as if one intends only to measure the first qubit, then one 
may delete the measurement constraint factors other than e{u\,z\), or alternatively one may leave 
outputs other than z\ unsubstituted and e.g. define Nr[~ 1 | a; b\] to be the number of assignments 
to yi,- ■ - ,yh an d Z2, ■ • • ,Zn that yield — 1 when b\ is substituted for z\ only. 
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similarly for the case b, — 1, and let ,q^ ,qj , denote the corresponding 

quantities in the new circuit C'. 

Now we can visualize the maze gadgets for Hadamard gates from chapter 7, 
which here have a ~1 path from h[b, = 1] for C to the same terminal for C 1 
and three positive paths involving the b[bi = 0] terminals. The maze corridors 
simply carry the values of the Hadamard matrix applied to line i, so we have: 


9o=Po + Pi 4i=Po+Pi 
%=Po+Pi <li=Po+Pi- 


(16.4) 


In terms of the new polynomials P' and R' for C, we have P f — P ■ (l — 2uyi,), 
and R' replaces e(ii,z.i) with e(y/,,z,). With a fixed, let So denote the set of 
assignments to y\,... ,yi,-i that make it — 0 and make P have value +1. Let To 
similarly stand for u — 0 and P having the value _ 1, and Si, 7) likewise for 
u — 1. Now let S', instead denote the assignments toyi,... ,y/, withy/, = 0 that 
give P' the value +1 and T ( t those with yh — 0 that give ~ 1; note that these arise 
only for /?, = 0. Finally, let Sj, T’J denote the corresponding sets with y/, = 1. 
An assignment in S' () is free to make u have either value because y/, = 0 makes 
P' have the same values as P, and applying this to the case bj = 0 takes care 
of the e(yi u Zi ) term. The similar observation for 7^ gives us likewise a 1-to-l 
correspondence, and with a slight abuse of notation because yh — 0 is fixed, 
we write them as disjoint unions: 


Sq = SoWSi 
T' 0 = 7o W 7). 


Now for the case bi — 1, we need y/, = 1, and this flips the sign of assignments 
that also make u = 1. We therefore obtain: 


S\ = SoWTi 
T[ = ToWSi. 

By the properties for C, we have — ||5o||, Pq = ||7’o||, p\ = ||5i||, and 
Pi = ||7’i||. Substituting these into the right-hand sides of (16.4) yields the 
goal identities for C' that correspond to (16.3) for C. 

In the case of a CNOT (respectively, Toffoli) gate with target on line i and 
control(s) on line j (and k), fix any a, b , and let b ' have b\ — bi © bj (respec¬ 
tively, b'j = bj © bjbh). It is incidental but helpful to note in both cases that the 
map from b to h' is its own inverse. The last “maze stage” for C' shunts paths 
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of C ending at b to b', with no branching or sign change, and vice versa. Hence, 
with similar notation to before, we have: 

4 = pi ; 

%' = Pb- 

By the induction hypothesis, p^ is equal to the number of assignments 
to y\,...,yh that make P have value 1, that is, to Nr[+1 | a; b]. Similarly, 
p7 — Nr[~1 | a; b]. We have P' — P and no new variable, so the only differ¬ 
ence is that e(u,Zi ) in R is replaced by e(u © v) (respectively, e(u © vw)) in R\ 
where u is the previous label on line i, and v, w are the unchanged labels on the 
control(s). The changes make 

N R [+l\a,b] = N R '[+l | a; b']; 

Nr[~ 1 | a,b] — N R ’[-l | a;b']. 

Because the left-hand sides are equal to and p7, respectively, the right-hand 
sides are equal to qf, and to q~^, as needed to be proved. □ 

As is already evident from the circuit examples in section 16.3, the multipli¬ 
cation makes the degree of P ramp up as gates are added. The number of terms 
ramps up even more if the product is multiplied out. We use one more wrinkle 
to make the terms add rather than multiply. 


16.5 The Additive Polynomial Simulation 

This is a short section but gives in some sense the tightest classical rendition 
of nature’s quantum computing power. It maps into the additive structure of 
integers modulo k — 2 rather than the multiplicative structure of +1,-1. The 
new wrinkle is that if an equality constraint e(ui,Zi ) is going to be violated, 
then let us allocate a fresh variable w,- and add the term 

w,(l - e(ui,zi )) = Wj(ui + zi - 2uizi), 

which further becomes simply w ,-(«,■ + Zj) under addition modulo 2. Consider 
now assignments Z to all the x; y; z variables that make n, ^ z r The multiplier 
of vv; becomes +1, so the final R will have the form R' + vv ( , with vv; appearing 
nowhere else in R. For every assignment that violates the constraint, one value 
of vv, will give 1 and the other will give 0, so they will cancel out with regard 
to the difference 


Nr [0 | a;b]-N R [ 1 | a; b]. 




16.6 Bounding BQP 


169 


Now in place of P we initialize an additive phase polynomial Q to 0, and the 
only other changes we need to make to the labeling algorithm are: 

1. For a Hadamard gate on line i, with u, and y/ ? as before, add to Q the term 

urn- 

2. The label change on a CNOT gate simplifies to = n,- + Uj, with again no 
change to Q. 

3. For a Toffoli gate with controls on lines i,j and target on line k, the new 
target label is u' k — Uk + u/Uj. 

4. At the end, R = Q + w i( 1 — 

For example, in the simple Hadamard + CNOT circuit diagram (16.2) in 
section 16.3, the label y + a 2 — 2ycii becomes simply y + a 2 . The polynomials 
of the larger circuit diagram in section 16.3 become: 

Q = x\y\ + xryi + ym + (ym + X3)y4; 

R = g + H’l(y3+y 2 y4 + Zl) + W ; 2(y2 + Z2) + W3(y4 + Z3)- 

Lemma 16.6 For all quantum circuits C on n qubits as above and a,b e 
{0,1}", 

p + {a,b) — p~(a,b) — Nr[ 0 \ a; b] —Nr[ 1 | a; b]. 

Proof. The verification details are a direct carryover from the proof of 
lemma 16.5, except that the presence of the w,- variables implies equivalence 
only for the difference p + — p~, not for the individual + and — path counts as 
before. □ 


16.6 Bounding BQP 

Our upper bound on BQP follows quickly from the characterization of quan¬ 
tum circuits by Hadamard and Toffoli gates in theorem 16.2. We restate the 
lower bound from theorem 16.4 as well. 

Theorem 16.7 BPP c BQP c PP. 

Proof. Given feasible quantum circuits C for a BQP algorithm with h -many 
nondeterministic (i.e., Hadamard) gates, we may assume that on any input 
x e {0,1}", C starts with x0'"~ n and is measured so that the result b — 10'" -1 
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gives acceptance. This uses results in chapter 6, especially section 6.3. By 
lemma 16.5, we quickly obtain a polynomial R(x;y;z ) such that upon sub¬ 
stituting a — xO m ~" for x and b for z, the acceptance probability is the square 
of the difference between Nr [+1 | a; b] and Nr[~1 | a; b], divided by 2 h . (Or 
by lemma 16.6, we obtain R(w;x;y,z ) such that the probability is ( Nr[0 \ 
a; b] — Nr[\ | a; b ]) 2 divided by 2 h+2m where in is the number of Wj variables 
employed.) Computing this difference exactly can be done in PP, and this suf¬ 
fices to distinguish whether the circuit accepts or rejects a given input. □ 

Getting the exact difference is actually overkill because a coarse approxima¬ 
tion d need only separate the case d 2 /2 h > 2/3 from d 2 /2 h < 1/3. Note, how¬ 
ever, that the approximations must be within some (1 + e) factor of d , not just 
within such a factor of AQ?[+1 | a; b] and Nr[~1 \ a; b] individually. Because 
Nr[+1 | a; b] — Nr[~ 1 | a; b] is divided by V2P, not by 2 h , the two will always 
come closer to canceling than a fixed-factor approximation to either can dis¬ 
tinguish. 

No containment relation is known between BQP and NP. In particular, 
although counting solutions of polynomials is NP-hard, and although this theo¬ 
rem implies that counting solutions is enough to determine whether a quantum 
circuit will accept, this does not mean that quantum circuits can solve NP-hard 
problems. Claims to this effect have been notorious. More notable, however, 
have been efforts to harness quantum processes to find heuristic solutions in 
many cases that are perhaps indelibly harder to obtain classically. Hence, there 
is interest in the possible heuristic solvability of some of the resulting problems 
in polynomial algebra. 


16.7 Problems 

16.1. Suppose / and g are feasible functions and L is a language such that for 
allx, g(x) is a polynomial p(yi, ... ,y m ) such thatx e L <=> A/[l] — N p [0] > 
f(x). Prove that L is in PP. 

16.2. Consider a NAND gate g with inputs u and v, and any of its output wires 
w. Write a Boolean formula / in literals ± m , ± v , ±w that is satisfied by exactly 
those assignments for which w is the correct output of NAND(m, v ). 

16.3. Suppose C is a classical circuit of NAND gates with inputs x\,... ,x n and 
yi,... ,y m and output wire w () . Show how to construct a Boolean formula (pc 
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that is satisfiable exactly when there exist x e {0,1}" and y e {0,1}'" such that 
C(xy ) outputs 1. How many extra variables do you need in terms of wires in 
C? 

16.4. Conclude that for any uniform family of circuits C„, the language L 
defined for all x by 

xeL <=> (3y) C n (xy ) = 1 

feasibly reduces to the language SAT of Boolean formulas (f> that are satis¬ 
fiable. Note that you can take <pc from problem 16.3 and substitute for the 
particular bit-values of x. This yields the Cook-Levin theorem that SAT is 
NP-complete. 

16.5. Show further that (pc and <f> can be written as conjuncts of clauses , where 
each clause is a single literal (such as wo or zbq), a disjunction of two positive 
literals, or a disjunction of three negative literals. This is a form of the problem 
called 3SAT, which is likewise NP-complete. 

16.6. Now convert the (p in problem 16.2 into a polynomial p(u, v, w) such that 
p(a i. < 32 , 03 ) = 0 for exactly those assignments 010203 that satisfy cp. What is 
its degree? Note that a NAND gate is simulated on binary arguments u, v by 
the polynomial 1 — uv, but what’s more important than translating the gate is 
translating the specification that the gate is working correctly. 

16.7. Build on the last problem to show a reduction from SAT (or alternately 
from 3SAT) to the solution-existence problem for polynomials, which is ques¬ 
tion (b) in section 16.2. Formally, this makes these two problems meet the 
definition given there of belonging to NP, so that they are NP -complete. 

16.8. Conclude from all this that if you define NP initially via polynomial-time 
nondeterministic machines or circuits, then the solution-existence problem is 
thereby shown to be NP-complete. 

16.9. Suppose we add a NOT gate to a quantum circuit. Work out the change 
to labels and the polynomial P needed to make the algorithm and verification 
of lemma 16.5 hold for this gate as well. 

16.10. Same as the last exercise, except now we add a CNOT gate. 

16.11. Suppose we have a kind of gate W for which we have an associated 
change of labels on the qubit lines it affects and update factors for the poly¬ 
nomial P (or rather Q). Show how to derive the labels and factors for the CW 
(i.e.. Controlled- W ) gate based on this information alone. 
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16.12. Add a case to the labeling algorithm and to the proof of lemma 16.5 for 
a NOT gate. 

16.13. Complete the case of the proof of lemma 16.5 for a CNOT gate. 

16.14. Give the polynomials Pc and Qc for the trivial one-qubit circuit with 
two Hadamard gates: 

Show, using Rc as well, that this circuit is equivalent to the identity for all 
measurements. 

16.15. Now consider again the circuit from section 6.3: 



Show that the conclusion of problem 16.14 does not hold for the first qubit 
by showing nonzero amplitude difference for case(s) with zi ^ x\. This shows 
that the inclusion of equality constraints in R( compared with P( or Qc matters 
in the equations. Use Rc to compute the amplitudes of all outcomes on input 
x = 00. 

16.16. Suppose we introduce a so-called CZ gate, which incidentally is the 
Grover oracle marking the string 11: 

10 0 0 
0 10 0 
0 0 10 
ooo-i 

What happens for this gate in the labeling algorithm and the proof of 
lemma 16.5? 

16.17. Modify the definition of Rc and/or the proof of theorem 16.7 for the 
regime where we measure only the value of the first qubit. 

16.18. Modify the additive “(9 version” of the labeling algorithm to allow the 
phase polynomial to have values 0,1,2,3 in the integers with addition modulo 
4, which gives a logarithmic representation of the multiplicative structure of 
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the values 1, i, —1, —i. Use this to incorporate into the labeling algorithm the 
so-called S-gate defined by 


The analogous version of lemma 16.6 becomes more complicated but retains 
the essence that the acceptance probability is a simple function of the values 
Nn[k | a\ b] for k — 0,1,2,3. 

16.19. Observe that when there are no Toffoli gates, that is, when only the gates 
mentioned in the above problems plus Hadamard gates are used, the polyno¬ 
mials Q obtained have degree no higher than two. Use the known theorem that 
solution-counting is in deterministic polynomial time for such polynomials 
over the integers modulo 4 to conclude the Gottesman-Knill theorem: every 
so-called stabilizer circuit of H, S, CZ gates only (plus X , CNOT, and some 
others) of size s can be simulated in classical polynomial time s 0(l K 

16.20. Let us extend the additive polynomial simulation to circuits with T- 
gates to work over Zx as follows: For a new Hadamard gate on a qubit line 
with label n, allocate the fresh variable yj as the new label and add 4 uyj to 
the phase polynomial Q. (The general rule modulo 2' is to add 2 r ~ l uyj.) For a 
T -gate, leave the label alone but add ir to Q. State and prove the appropriate 
version of lemma 16.6 for this representation. (Thus, the “Crescent Phils” and 
“Gibbous Phils” mentioned in section 7.5 are formally taken care of here.) 

16.21. With reference to problem 16.20, analyze the one-qubit commutator 
circuit formed by HTHT~ l . What are the probabilities of measuring 0 and 1 
on input <?o and e \ , respectively? 


16.8 Summary and Notes 

The accepted definition of BQP comes from the appropriately named 1997 
article “Quantum Complexity Theory” by Bernstein and Vazirani (1997). Its 
polynomial-space upper bound was improved to PP by Adleman et al. (1997). 
The sharper observation that it belongs to a definitionally smaller class (called 
AWPP) characterized by taking the difference of two #P functions was set 
down by Fortnow and Rogers (1999). Our proof of this follows Dawson et al. 
(2004), while the example preceding lemma 16.5 comes also from Gerdt and 
Severyanov (2006). A different, novel, and informative proof of BQP C PP 
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via the concept of “post-selection” comes from Aaronson (2005), and this also 
simplified the proof of known closure properties of PP. For the Solovay-Kitaev 
theorem we refer to Dawson and Nielsen (2006). The Gottesman-Knill theo¬ 
rem was first set down by Gottesman (1998), and the first of several improve¬ 
ments to the algorithm came from Aaronson and Gottesman (2004). The the¬ 
orem about counting solutions of quadratic polynomials is joint work of this 
text’s first author (Cai et al., 2010), while the relevance to Gottesman-Knill 
was noticed by the second author. Much further work of high interest is due 
to Aaronson (2010) and Aaronson and Arkhipov (2010); see also the recent 
popular book by Aaronson (2013). 



Beyond 


Our goal has been to get you started understanding some of the basic concepts 
from quantum algorithms. We hope that you now have a working understand¬ 
ing of them. 

One of the takeaways is that the quantum algorithms are not that compli¬ 
cated, nor are they based on mysterious ideas. They are not even necessarily 
complex, neither in the sense of needing complex numbers nor in having many 
loops and high running times. We hope that you now see that quantum algo¬ 
rithms in general have a simple structure and that their analysis is quite stan¬ 
dard. The proofs that show they work use basic tools from linear algebra and 
number theory—nothing strange or exotic is required. 


17.1 Reviewing the Algorithms 

For each quantum algorithm, we have shown a respect in which it surpasses a 
classical algorithm for the same task. This started by counting the number of 
evaluations of a function / given as a black-box parameter of the task. With 
Deutsch’s algorithm, we saved one evaluation off in the worst case by using 
the ability in the quantum world to apply/ to a linear combination of the basis 
vectors denoting argument strings. Then with the Deutsch-Jozsa algorithm, 
we saved more such queries, and with Simon’s algorithm, we saved exponen¬ 
tially many even in the expected case. Shor’s algorithm surpassed in raw time 
any known classical algorithm, while Grover search and quantum walk search 
algorithms confer a definite polynomial savings. 

The first question to consider, going beyond these algorithms, is what is the 
common origin of these savings? What is the quantum core of them? We offer 
the following candidate for “the” common feature: 


Quantum computation is a great amplifier of slight 
regularities and deviations. 


In a Grover search, the initial deviation is as small as possible—it can be 
just one solution that flips the sign of one component in an exponential-sized 
start vector. This gives the minimum improvement: 0(2 n / 2 ) iterations in place 
of 0(2") expected time. In quantum walks, the linear time for spreading (com¬ 
pared with quadratic time for classical walks on undirected graphs) preserves 
the amplification in the simple Grover setting and focuses the mechanism of 
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amplification for greater savings when the solution set size can be inflated 
while keeping small time for updates between evaluations of the search func¬ 
tion. The greater savings in Shor’s algorithm can then be viewed as arising 
because periodicity is a greater regularity than the presence of a solution. It 
was still far from obvious that a polynomial amount of work would suffice to 
amplify it, but Simon’s algorithm provided a hint. In Simon’s algorithm, the 
reason a linear amount of work suffices with high probability is that 2 n ran¬ 
dom choices usually suffice to find a basis of n independent binary vectors 
in n -dimensional Hilbert space, even though the set of all binary vectors has 
exponential size. This goes also for related hidden subgroup problems, which 
we have passed over, except for a mention in problems 10.6-10.8, and which 
constitute one possible line of further study. 

What will it take to find a great new kind of quantum algorithm? It may need 
first thinking up a new kind of amplification. 


17.2 Some Further Topics 

To sum up the various paths from this point on, all of the following can be 
amplified for further study without canceling each other out. There is much 
more to learn from what is already known and much more to learn that is 
being discovered right now. 

• More about algorithms. All of the algorithms we have covered, especially 
from Simon’s algorithm onward, have further developments. For example, 
some new results by Belovs et al. (2013) and Jeffery et al. (2013) have come 
from the idea of nesting quantum walks in a quantum manner, not just classi¬ 
cally as with the recursion we sketched for the chess/NAND-tree example in 
chapter 15. Movement toward regarding quantum routines more as enabling 
sampling from classically difficult distributions is also current. 

• More about computational complexity. We have only scratched the surface 
of complexity classes neighboring BQP whose relationships to each other, let 
alone to quantum classes, form one of the great mysteries of our age. Besides 
whether BQP ^ BPP and whether factoring is in the difference, there are the 
questions of whether BQP contains the graph isomorphism problem and whet¬ 
her it is contained in any level of the polynomial hierarchy above NP. 




17.2 Some Further Topics 


177 


• More about quantum physics. There are a plethora of possibilities. Before 
you go any further, you will need to internalize Dirac’s bra-ket-etc. and 
other physics notation. Then it will help to study evolution according to 
Schrodinger’s equation. This will open the way to other physical models, 
such as adiabatic quantum computation. A further physical topic is topolog¬ 
ical quantum computation. With all of these models, there is the question of 
building physical quantum computers. We hosted a year-long debate on this 
issue between Gil Kalai and Aram Harrow on the Godel’s Lost Letter blog in 
2012. 

• More about quantum implementations and circuits. There is a huge lit¬ 
erature on both these related topics that you can read more about. This will 
explain and show in greater detail why all the unitary transforms we called 
feasible indeed are. Other longer, more standard textbooks than ours are places 
to start. There are many new papers and expositions on the engineering chal¬ 
lenges every month. The most important topic not included in this text is the 
quantum fault-tolerance theorem and its beautiful use of error-correcting 
codes, as may be found in all the larger texts we have referenced. 

• More about quantum information theory, communication, and cryptog¬ 
raphy. They go together, are included in all of the longer texts, and provide 
applications here and now. We covered some of the basics in section 8.3 and 
mentioned Holevo’s theorem that transmitting an n -qubit quantum state in iso¬ 
lation can confer at most n bits of classical information. The idea of quantum 
cryptography is to build new types of security systems that rely not on the 
hardness of some computational problem—like classical cryptography does— 
but on the special behavior of quantum systems. Actual usable physical devices 
have been built and put into service to secure the most sensitive financial trans¬ 
actions. 

• More about related mathematical and scientific areas. Shor’s algorithm 
certainly shows how several areas of number theory are needed to channel 
the quantum outputs. Quantum computing may be a coal-mine canary for the 
prospects of various “theories of everything” in science. Biological processes 
from DNA upward are already profitably regarded as computational, and one 
can expect quantum imprints. The title and blurb of the article by Ball (2011) 
say it all: “Physics of life: The dawn of quantum biology. The key to practical 
quantum computing and high-efficiency solar cells may lie in the messy green 
world outside the physics lab.” 
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Specific to linear algebra, we mention two nifty theorems from the beautiful 
theory of spectral decompositions: 

1. Every self-adjoint matrix A on C" has an orthonormal basis of eigenvec¬ 
tors Xi with real eigenvalues A;, such that e ,A is well defined (using the 
same Dirac notation for projectors via outer products that was discussed in 
section 6.4) by: 

e ,A _ g ai | x i> { x 11 - \- e a ’'\x„){x n \. 

2. Every unitary matrix U arises as U — e lA for some self-adjoint A that is 
also unitary, so that each A,- is a unit complex number identified with an 
angle. 

This opens the way to understanding things like Schrodinger’s equation and 
the role of Planck’s constant (called horh= j-) in topics like the Heisenberg 
Uncertainty Principle. These are advanced physics topics, but a nonphysicist 
can draw on linear algebra and ideas of computations to approach them. For 
instance, the general solution to Schrodinger’s equation in the case of a time- 
independent Hamiltonian operator H is 

U(t) = e~ iH,/h . 

Here H does not stand for Hadamard, but, like Hadamard transforms, the 
Hamiltonian operators are self-adjoint, so we can apply the above to under¬ 
stand the whole right-hand side as a unitary operator for any 1 , which is what 
the left-hand side says it is. Now you can see ways in which this is like a com¬ 
putation. Then we can even ask questions like how far all this is analogous to 
the way our additive simulation in section 16.5 works with the logarithms of 
the quantities in the multiplicative simulation of section 16.4. 

There are many more topics beyond this book than we can list. One general 
phenomenon is that many important concepts in classical computation have 
been reissued in “quantum” versions. Besides BQP being the quantum ver¬ 
sion of BPP, there is a quantum analogue QNP of NP and so on for many 
other complexity classes. There are quantum finite automata, quantum cellular 
automata, and even quantum formal grammars. Quantum communication com¬ 
plexity and information assurance has evolved into a big field, with both quan¬ 
tum versions of classical protocols and distinctively quantum situations. We 
end, however, by looking inward at what is distinctively “quantum” about the 
algorithms we have studied, quantum particularly meaning beyond the known 
reach of classical computations. 
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17.3 The “Quantum” in the Algorithms 

We began with the question: can we build a physical quantum computer to 
confer the promised advantages of quantum algorithms? We have not covered 
the engineering challenges in this text or the quantum fault-tolerance theorem. 
That theorem sets concrete rather than asymptotic physical thresholds for mak¬ 
ing quantum error-correction work, which is a prerequisite for enabling current 
designs of quantum computers to be engineered. Before we get there, however, 
we can pose a simpler question based on what we have learned: 


Where do the advantages of quantum algorithms come from? 


Quantum mechanics gives a new lever for algorithms. Archimedes famously 
said, “Give me a place to stand, and I shall move the Earth”—meaning a place 
far enough away for a long enough lever. His Greek words did not say, how¬ 
ever, where he would find a base for the lever. With quantum states, we cer¬ 
tainly have exponentially long vectors, but what is the base—where does the 
power reside? We consider some possible questions and answers: 


• Is the power already in the exponentially long notation? Not in terms of infor¬ 
mation —by Holevo’s theorem mentioned above, transmitting an n-qubit quan¬ 
tum state can confer at most n bits of classical information. This is so even if 
the state is a feasible relational superposition s r = X v v ■ R(xy) e * ® e y (suitably 
normalized). This appears to have information about values related to every 
x, but the state allows extracting only one such entire value via measurement. 
More precisely, it yields at most | v| + | v| bits of information. This limit also 
matters if we try to encode a general n-vertex graph with Q(ir) edges using 
only n qubits. We wonder which relations R other than the graphs of feasible 
functions may make ,s> feasible to prepare. 


• Is it in the ability to generate entanglement? Not alone. Although Hadamard 
and CNOT gates can generate lots of entanglement and together with the S- 
gate can create all the error-correcting codes needed for the fault-tolerance the¬ 
orem, nevertheless by the Gottesman-Knill theorem outlined in the exercises of 
chapter 16, circuits of these gates all have feasible classical simulations. Mea¬ 
sures of entanglement, especially for n-partite not just binary systems, have 
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no agreed-on standard definition, while some working definitions have high 
complexity. 


• Does it have to do with interference? Definitely yes, but... The “but” is 
that exponential savings from interference presupposes that we have harnessed 
exponential power somehow in generating the cancellations, which throws 
us back into points and discussions above. However, simple, nonexponential 
interference yields many working applications in quantum communication and 
can be experienced by anyone with two sheets of polarizing material. 

• Is it in superpositions such as j n — -^= Xxe{0 l}"G- which despite its expo¬ 
nential size as a formula, is feasibly obtained by applying the Hadamard trans¬ 
form? At least in part, but... The “but” this time is that working on j n is not 
the same as having true “exponential parallelism.” If it were, then we would 
expect quantum algorithms to be able to solve NP-complete problems by mak¬ 
ing unstructured (Grover) search run in polynomial time. Not only is there no 
real evidence for N P c BQP, but we have noted that Q (2 n/2 ) is a lower bound 
on the number of queries needed by Grover’s algorithm specifically and hence 
on its running time. 


• Is it specific to the quantum Fourier transform? Oddly there is evidence for 
“no." First, we have noted, although not proved, that Shor’s algorithm requires 
only moderately coarse approximations to the QFT, and that these are readily 
obtained via circuits of other fundamental gates, even Hadamard and Toffoli 
gates alone. These results do presuppose the ability to carry out the n-qubit 
Hadamard transform. Second, because Shor’s algorithm does a complete mea¬ 
surement immediately after applying the QFT, the same distribution can be 
obtained by iteratively measuring each qubit after applying just H 2 and Rq to 
it, where the rotation angle 6 depends on the results of previous measurements, 
for which see Preskill (2004). Thus, for factoring, the QFT can “deconstruct” 
into simple single-qubit operations and measurements. Third, there is also clas¬ 
sical super-polynomial power in Simon’s algorithm, although (a) the problem 
it solves does not have as strong evidence as Shor’s of being beyond classical 
reach, and (b) denying the classical solvers for Simon’s problem the ability to 
use field extensions and linear combinations may give an unfair comparison. 
The fourth line of evidence against the QFT alone being the fulcrum comes 
next. 
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• Is it in the functional superpositions? Referencing a famous Peanuts cartoon, 
we are tempted to say, “That’s it!” In the cartoon, Charlie Brown is in ses¬ 
sion at Lucy’s psychiatry stand, and she enumerates names of phobias to see 
which could be troubling him. When she comes to “pantophobia” (the fear of 
everything), Charlie’s exclamation knocks both off their chairs. 

There are several senses in which the full QFT matrix F,y has feasible clas¬ 
sical simulations, such as when it is given a product state—any product state, 
not necessarily one of the standard basis states—as input. These simulations 
are not known to apply to functional superposition states; if they did, then fac¬ 
toring would be in BPP. See Aharonov et al. (2006), Jozsa (2006), Markov and 
Shi (2008), van den Nest (2013). The last of these references asserts that the 
power must reside somewhere in the interface between the quantum and classi¬ 
cal components of Shor’s algorithm, and the superpositions via the Hadamard 
transform set up this interface. At least this serves as our apology for not 
making the super-classical power leap right off the page in chapter 11 and 
also explains the need to follow the linear algebra and the classical number- 
theoretic parts of the argument closely. 

• So where else can it be? Perhaps Lucy is right—we fear that everything 
needs to be considered together to find the quantum power. The lever may 
have no single base. Rather, we may need an Archimedean lens by which to 
focus the ensemble of components we have shown. Perhaps the focus can be an 
invariant from algebraic geometry that is expressible via ideals of polynomials 
over some ring or field. An algebraic-geometric invariant is the source of what 
remain the only general super-linear lower bounds on arithmetical circuits for 
low-degree polynomial functions, which were obtained by Baur and Strassen 
(1982) following Strassen (1973). We suspect that this algebraic mechanism 
should have some natural role in metering quantum circuits as well. 

If we step back from the deep difficulties of computational complexity the¬ 
ory and stay with subjects in communication and information such as quan¬ 
tum cryptography, then the power is easy to find, combining entanglement 
and interference and human-computer interaction. Although our emphasis on 
quantum algorithms has stood apart from these subjects’ concerns, we hope 
the coverage here of quantum computing fundamentals has opened a way into 
them with increased amplitude. Whatever your path, provided it augments and 
does not cancel, we wish for it a measure of success. 
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